Cleaner SanitizeBoundingBoxes transform #8294

MarcSzafraniec · 2024-03-04T15:22:43Z

🚀 The feature

A functional version of SanitizeBoundingBoxes
A version of SanitizeBoundingBoxes that is aligned with other detection transforms: it should either take a BoundingBoxes TVTensor, or other transforms should be able to take in a dictionary that contains boxes, among others.
A version of SanitizeBoundingBoxes that can deal with more keys from that dictionary that just "boxes" and "labels"

Motivation, pitch

SanitizeBoundingBoxes is a transform for bounding boxes that is a bit different from the others, as it doesn't take a bounding box object but a dictionary containing a bounding box object and maybe labels.

So we cannot use it in a transforms.Compose object as we can with other transforms, and have to apply it separately.
Plus, it has no functional version, which forces us to instantiate a transforms object and apply it to said dictionary.

Finally, an important drawback of SanitizeBoundingBoxes is that it subsets the boxes and the labels together, but not other detection features such as area or iscrowd. This is error prone, and this transform should at least be able to accommodate the standard COCO format.

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

NicolasHug · 2024-03-04T16:01:12Z

Thanks a lot for the detailed feature request @MarcSzafraniec . This sounds reasonable overall, I shared more detailed thoughts below. LMK what you think!

A functional version of SanitizeBoundingBoxes

Sounds good. Not sure whether we should allow labels and other keys here, or just bouding boxes, we'll see. Maybe we can get away by just returning the valid mask and let the user subsample they input.

A version of SanitizeBoundingBoxes that is aligned with other detection transforms: it should either take a BoundingBoxes TVTensor, or other transforms should be able to take in a dictionary that contains boxes, among others.

All of this is already supported. I.e. SanitizeBoundingBoxes() supports a single BoundingBoxes() object as input:

from torchvision.transforms import v2
from torchvision import tv_tensors
import torch

bbox = tv_tensors.BoundingBoxes(torch.tensor([[0, 0, 10, 10], [10, 10, 20, 20]]), canvas_size=(100, 100), format="XYXY")
v2.SanitizeBoundingBoxes(labels_getter=None)(bbox)

# BoundingBoxes([[ 0,  0, 10, 10],
#                [10, 10, 20, 20]], format=BoundingBoxFormat.XYXY, canvas_size=(100, 100))

and the other transforms do support dictionary and arbitrarily nested inputs, just like SanitizeBoundingBoxes: https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_getting_started.html#what-do-i-pass-as-input

You can definitely use SanitizeBoundingBoxes() within Compose() for example: https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_e2e.html#transforms

It seems like something wasn't clear in the docs here - I'd love for you to let us know what could be improved on that front.

Finally, an important drawback of SanitizeBoundingBoxes is that it subsets the boxes and the labels together, but not other detection features such as area or iscrowd. This is error prone, and this transform should at least be able to accommodate the standard COCO format.

In general it's going to be untracktable to subset arbitrary inputs (and potentially inconvenient, since it would force those inputs to be in a specific expected format). But perhaps it would be enough let the functional return the subsampling maks (as suggested above), and let the user do the indexing themselves on their desired ipnuts (area, iscrowd, etc.) ?

MarcSzafraniec · 2024-03-04T16:09:20Z

Thanks a lot for the detailed feature request @MarcSzafraniec . This sounds reasonable overall, I shared more detailed thoughts below. LMK what you think!

A functional version of SanitizeBoundingBoxes

Sounds good. Not sure whether we should allow labels and other keys here, or just bouding boxes, we'll see. Maybe we can get away by just returning the valid mask and let the user subsample they input.

A version of SanitizeBoundingBoxes that is aligned with other detection transforms: it should either take a BoundingBoxes TVTensor, or other transforms should be able to take in a dictionary that contains boxes, among others.

All of this is already supported. I.e. SanitizeBoundingBoxes() supports a single BoundingBoxes() object as input:
from torchvision.transforms import v2
from torchvision import tv_tensors
import torch

bbox = tv_tensors.BoundingBoxes(torch.tensor([[0, 0, 10, 10], [10, 10, 20, 20]]), canvas_size=(100, 100), format="XYXY")
v2.SanitizeBoundingBoxes(labels_getter=None)(bbox)

# BoundingBoxes([[ 0,  0, 10, 10],
#                [10, 10, 20, 20]], format=BoundingBoxFormat.XYXY, canvas_size=(100, 100))
and the other transforms do support dictionary and arbitrarily nested inputs, just like SanitizeBoundingBoxes: https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_getting_started.html#what-do-i-pass-as-input

You can definitely use SanitizeBoundingBoxes() within Compose() for example: https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_e2e.html#transforms

It seems like something wasn't clear in the docs here - I'd love for you to let us know what could be improved on that front.

Finally, an important drawback of SanitizeBoundingBoxes is that it subsets the boxes and the labels together, but not other detection features such as area or iscrowd. This is error prone, and this transform should at least be able to accommodate the standard COCO format.

In general it's going to be untracktable to subset arbitrary inputs (and potentially inconvenient, since it would force those inputs to be in a specific expected format). But perhaps it would be enough let the functional return the subsampling maks (as suggested above), and let the user do the indexing themselves on their desired ipnuts (area, iscrowd, etc.) ?

Thanks for answering so quickly ! Well, indeed we can pass either a BoundingBoxes object to SanitizeBoundingBoxes, or a dict to other transforms, but then the mask information is lost.

Maybe a solution would be to have a "labels_getter" that could be in fact a list of dictionary keys for which to apply the mask ? It would be simple to implement and probably generic enough.

NicolasHug · 2024-03-05T14:16:40Z

Summarizing an offline chat with @MarcSzafraniec :

A. Allow users to subsample arbitrary stuff

We need to allow users to subsample arbitrary stuff, not just the labels (e.g. "iscrowd" info, etc). This can be done in different ways, with the constraint that we don't want to enforce a specific input structure as this would go against a fundamental design principle of v2 (e.g. we don't want to enforce specific keys or even a dict structure).

One suggested solution would be to let labels_getter return a tuple of tensors. Each tensor in that tuple gets subsampled, just like the label tensor. The tuple entries technically don't need to be tensors, they just have to be anything that supports [bool_mask] indexing.

B. Having a functional

We've left out the functional implementation of SanitizeBoundingBoxes, but it can still be useful. This functional should be able to subsample the BoundingBoxes objects but also the labels and arbitrary tensors. I see 2 main ways of doing this for now (there's probably others), both of which deviate from what we typically call a functional (then again, this SanitizeBoundingBoxes is an outlier itself)

Option 1

def sanitize_bounding_boxes(boxes, *args):
    # subsample boxes and everything in *args

# User code:
boxes, labels, iscrowd = sanitize_bounding_boxes(boxes, labels, is_crowd)

Option 2

def get_sanitize_bouding_boxes_mask(boxes):
    # just return the boolean subsampling mask
	return is_valid

# User code:
valid_mask = get_sanitize_bouding_boxes_mask(boxes)
boxes, labels, iscrowd = boxes[valid_mask], labels[valid_mask], iscrowd[valid_mask]

Note: a way to get the valid_mask with option 1 is to pass boxes, valid_indices = sanitize_bounding_boxes(boxes, torch.arange(boxes.shape[0]))

EDIT: actually Option2 is not as pleasant as I thought because it forces users to call boxes = tv_tensors.wrap(...) to preserve the BoundingBoxes type.

CC @pmeier @vfdev-5 for your input on both design discussions here (A and B).

pmeier · 2024-03-08T10:42:04Z

A:

👍

B:

I like option 1 best, but I don't think JIT supports *args. So this is another way this would deviate from our current implementation.

Plus, when you say "functional" here, do you mean an actual functional, i.e. TVTensor input or a kernel? In the former case how would you handle rewrapping?

Without breaking any of our current paradigms, let me propose a third option:

def sanitize_bounding_boxes(boxes):
    # compute subsampling mask and apply to boxes
    return boxes, is_valid

That way people can use the mask if needed without extra computation, and don't have to subsample their boxes themselves. If we treat this function as kernel rather than as "functional", we also don't have an issue with the tuple return. For boxes we already have kernels that return a tuple, with the first item always being the boxes and the remaining parts being metadata.

MarcSzafraniec · 2024-03-11T13:01:52Z

I have an additional comment: why this check if min_size < 1: raise ValueError(f"min_size must be >= 1, got {min_size}.") ? Boxes could be relative, or even just floating point, in that case its possible to have a width or height < 1 ...

NicolasHug · 2024-03-12T11:45:10Z

@MarcSzafraniec that's because in torchvision the expected format for the bounding boxes coordinates is absolute, not relative. The rest of the transforms also expect e.g. the XYXY format to be in absolute pixel coordinates.

NicolasHug mentioned this issue Mar 12, 2024

Add sanitize_bounding_boxes kernel/functional #8308

Merged

NicolasHug mentioned this issue Mar 15, 2024

Allow SanitizeBoundingBoxes to sanitize more labels #8319

Merged

NicolasHug closed this as completed in #8319 Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaner SanitizeBoundingBoxes transform #8294

Cleaner SanitizeBoundingBoxes transform #8294

MarcSzafraniec commented Mar 4, 2024

NicolasHug commented Mar 4, 2024 •

edited

Loading

MarcSzafraniec commented Mar 4, 2024

NicolasHug commented Mar 5, 2024 •

edited

Loading

pmeier commented Mar 8, 2024

MarcSzafraniec commented Mar 11, 2024

NicolasHug commented Mar 12, 2024

Cleaner SanitizeBoundingBoxes transform #8294

Cleaner SanitizeBoundingBoxes transform #8294

Comments

MarcSzafraniec commented Mar 4, 2024

🚀 The feature

Motivation, pitch

Alternatives

Additional context

NicolasHug commented Mar 4, 2024 • edited Loading

MarcSzafraniec commented Mar 4, 2024

NicolasHug commented Mar 5, 2024 • edited Loading

A. Allow users to subsample arbitrary stuff

B. Having a functional

pmeier commented Mar 8, 2024

MarcSzafraniec commented Mar 11, 2024

NicolasHug commented Mar 12, 2024

NicolasHug commented Mar 4, 2024 •

edited

Loading

NicolasHug commented Mar 5, 2024 •

edited

Loading