Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a Difference between Torchvsion Shear and PIL Shear #5204

Closed
SamuelGabriel opened this issue Jan 17, 2022 · 3 comments · Fixed by #5285
Closed

There is a Difference between Torchvsion Shear and PIL Shear #5204

SamuelGabriel opened this issue Jan 17, 2022 · 3 comments · Fixed by #5285
Assignees

Comments

@SamuelGabriel
Copy link
Contributor

SamuelGabriel commented Jan 17, 2022

🐛 Describe the bug

As @agaldran pointed out in the TrivialAugment repo (automl/trivialaugment#6), there seems to be difference between the behavior of a shear implemented with the PIL and the TorchVision affine transform.

This is an issue for the autoaugment algorithms, as this might yield to different results compared to other implementations of these algorithms. It might be an issue for other applications as well. Both the AutoAugment (https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment) and the TrivialAugment (https://github.com/automl/trivialaugment) reference implementations use PIL, while RandAugment has no reference implementation.

from PIL import Image
import math
import torchvision # >= 0.11

from torchvision.transforms import functional as F
interpolation = torchvision.transforms.InterpolationMode.NEAREST
fill = None

img = Image.new('RGB', (32,32), (255,255,0))
magnitude = .7

# shear_x as seen in torchvision https://github.com/pytorch/vision/blob/b5aa0915fe16e82ee4c24919032b4e7afae3ae1b/torchvision/transforms/autoaugment.py#L17
im_torch = F.affine(img, angle=0.0, translate=[0, 0], scale=1.0, shear=[math.degrees(magnitude), 0.0],
                    interpolation=interpolation, fill=fill)

# shear_x as seen in https://github.com/automl/trivialaugment/blob/3bfd06552336244b23b357b2c973859500328fbb/aug_lib.py#L156 and https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment/augmentation_transforms.py#L290
im_pil = img.transform(img.size, Image.AFFINE, (1, magnitude, 0, 0, 1, 0))

im_torch.show()
im_pil.show()

This script will yield the following images:

download
download

Which should be the same.

It looks like torchvision has a fixed center, while PIL uses a fixed top. I did not dig into the code much, yet, though. Is there someone here who implemented affine maybe and can give a reasoning for the different shearing.

Versions

Collecting environment information...
PyTorch version: 1.10.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-94-generic-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.0
[pip3] torch==1.10.1
[pip3] torchvision==0.11.2
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.22.0 pypi_0 pypi
[conda] torch 1.10.1 pypi_0 pypi
[conda] torchvision 0.11.2 pypi_0 pypi

cc @vfdev-5 @datumbox

@datumbox
Copy link
Contributor

@SamuelGabriel Thanks for reporting. I agree that's important to investigate.

@vfdev-5 I think you might be the original author of affine (#2444). Do you have the bandwidth to have a look?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 17, 2022

@SamuelGabriel thanks for asking that. I'd say the answer would be more like it was a convention made. I think I was inspired from keras at the time of implementing and submitting #2444 PR.
IMO, rotational part inside affine makes sense if the pivot point is in the center. So, thus everything follows the same convention.
Maybe, we can consider this issue as a feature request to introduce center arg, as it is done for F.rotate.

Torchvision vs PIL affine parametrization ?

Here is how affine matrix is defined:

# As it is explained in PIL.Image.rotate
# We need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1
# where T is translation matrix: [1, 0, tx | 0, 1, ty | 0, 0, 1]
# C is translation matrix to keep center: [1, 0, cx | 0, 1, cy | 0, 0, 1]
# RSS is rotation with scale and shear matrix
# RSS(a, s, (sx, sy)) =
# = R(a) * S(s) * SHy(sy) * SHx(sx)
# = [ s*cos(a - sy)/cos(sy), s*(-cos(a - sy)*tan(x)/cos(y) - sin(a)), 0 ]
# [ s*sin(a + sy)/cos(sy), s*(-sin(a - sy)*tan(x)/cos(y) + cos(a)), 0 ]
# [ 0 , 0 , 1 ]
#
# where R is a rotation matrix, S is a scaling matrix, and SHx and SHy are the shears:
# SHx(s) = [1, -tan(s)] and SHy(s) = [1 , 0]
# [0, 1 ] [-tan(s), 1]
#
# Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1

We may have some differences on how shear is parametrized.

EDIT:
If we use the pivot point as top-left corner, the result will be the following:
image

we can see that the shear angle is not exactly the same, due to different parametrizations.

EDIT2:
Maybe, it makes sense to have the following matrix definition:

M = T^-1 C^-1 * RS * C * SHy(sy) * SHx(sx) 

where RS is now only rotation and scale

@vadimkantorov
Copy link

Related: #5194

@vfdev-5 vfdev-5 self-assigned this Jan 18, 2022
vfdev-5 added a commit to vfdev-5/vision that referenced this issue Jan 26, 2022
datumbox added a commit that referenced this issue Feb 2, 2022
Fixes #5204

Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
facebook-github-bot pushed a commit that referenced this issue Feb 11, 2022
…5285)

Summary:
Fixes #5204

Reviewed By: NicolasHug

Differential Revision: D34140243

fbshipit-source-id: 0b7c01b3479d5ef0eb9dfab64e317bb31eff0b31

Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants