Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced number of graphs for compiled resize #8108

Merged
merged 8 commits into from
Nov 22, 2023
Merged
27 changes: 18 additions & 9 deletions torchvision/transforms/v2/functional/_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,21 @@ def resize(
return kernel(inpt, size=size, interpolation=interpolation, max_size=max_size, antialias=antialias)


# This is an internal helper method for resize_image. We should put it here instead of keeping it
# inside resize_image due to torchscript.
# uint8 dtype support for bilinear and bicubic is limited to cpu and
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
# For torch.compile we use uint8 input and let decomposition work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be clear that the reason we always use uint8 for dynamo is simply that it doesn't support get_cpu_capability(), so with the suggested comment below, this comment is probably unnecessary

Suggested change
# For torch.compile we use uint8 input and let decomposition work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decomposition can also not support uint8 dtype, so the fact that we return True instead of False is that we believe that decomposition can work with uint8 dtype.
Even if dynamo "supported" get_cpu_capability() this heuristic to perform u8->f32->interpolate->u8 on non-AVX systems can be wrong for compiled version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.

We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like

def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool:
	if torch._dynamo.is_compiling():
	    return True  # both bilinear and bicubic are OK, right?
	# then conditions as before

And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?

Copy link
Collaborator Author

@vfdev-5 vfdev-5 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if torch._dynamo.is_compiling():
	    return True  # both bilinear and bicubic are OK, right?

Well, right now, it may be safer to set return False due to pytorch/pytorch#104182 not yet merged

def _do_native_uint8_resize_on_cpu(interpolation: Union[InterpolationMode, int]) -> bool:
if interpolation == InterpolationMode.BILINEAR:
if torch._dynamo.is_compiling():
return True
else:
return "AVX2" in torch.backends.cpu.get_cpu_capability()

return interpolation == InterpolationMode.BICUBIC


@_register_kernel_internal(resize, torch.Tensor)
@_register_kernel_internal(resize, tv_tensors.Image)
def resize_image(
Expand Down Expand Up @@ -215,21 +230,15 @@ def resize_image(
if (new_height, new_width) == (old_height, old_width):
return image
elif numel > 0:
image = image.reshape(-1, num_channels, old_height, old_width)

dtype = image.dtype
acceptable_dtypes = [torch.float32, torch.float64]
if interpolation == InterpolationMode.NEAREST or interpolation == InterpolationMode.NEAREST_EXACT:
# uint8 dtype can be included for cpu and cuda input if nearest mode
acceptable_dtypes.append(torch.uint8)
elif image.device.type == "cpu":
# uint8 dtype support for bilinear and bicubic is limited to cpu and
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
if (interpolation == InterpolationMode.BILINEAR and "AVX2" in torch.backends.cpu.get_cpu_capability()) or (
interpolation == InterpolationMode.BICUBIC
):
acceptable_dtypes.append(torch.uint8)
elif image.device.type == "cpu" and _do_native_uint8_resize_on_cpu(interpolation):
acceptable_dtypes.append(torch.uint8)

image = image.reshape(-1, num_channels, old_height, old_width)
strides = image.stride()
if image.is_contiguous(memory_format=torch.channels_last) and image.shape[0] == 1 and numel != strides[0]:
# There is a weird behaviour in torch core where the output tensor of `interpolate()` can be allocated as
Expand Down