-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner SanitizeBoundingBoxes transform #8294
Comments
Thanks a lot for the detailed feature request @MarcSzafraniec . This sounds reasonable overall, I shared more detailed thoughts below. LMK what you think!
Sounds good. Not sure whether we should allow
All of this is already supported. I.e. from torchvision.transforms import v2
from torchvision import tv_tensors
import torch
bbox = tv_tensors.BoundingBoxes(torch.tensor([[0, 0, 10, 10], [10, 10, 20, 20]]), canvas_size=(100, 100), format="XYXY")
v2.SanitizeBoundingBoxes(labels_getter=None)(bbox)
# BoundingBoxes([[ 0, 0, 10, 10],
# [10, 10, 20, 20]], format=BoundingBoxFormat.XYXY, canvas_size=(100, 100)) and the other transforms do support dictionary and arbitrarily nested inputs, just like You can definitely use It seems like something wasn't clear in the docs here - I'd love for you to let us know what could be improved on that front.
In general it's going to be untracktable to subset arbitrary inputs (and potentially inconvenient, since it would force those inputs to be in a specific expected format). But perhaps it would be enough let the functional return the subsampling maks (as suggested above), and let the user do the indexing themselves on their desired ipnuts (area, iscrowd, etc.) ? |
Thanks for answering so quickly ! Well, indeed we can pass either a Maybe a solution would be to have a "labels_getter" that could be in fact a list of dictionary keys for which to apply the mask ? It would be simple to implement and probably generic enough. |
Summarizing an offline chat with @MarcSzafraniec : A. Allow users to subsample arbitrary stuffWe need to allow users to subsample arbitrary stuff, not just the labels (e.g. "iscrowd" info, etc). This can be done in different ways, with the constraint that we don't want to enforce a specific input structure as this would go against a fundamental design principle of v2 (e.g. we don't want to enforce specific keys or even a dict structure). One suggested solution would be to let B. Having a functionalWe've left out the functional implementation of Option 1 def sanitize_bounding_boxes(boxes, *args):
# subsample boxes and everything in *args
# User code:
boxes, labels, iscrowd = sanitize_bounding_boxes(boxes, labels, is_crowd) Option 2 def get_sanitize_bouding_boxes_mask(boxes):
# just return the boolean subsampling mask
return is_valid
# User code:
valid_mask = get_sanitize_bouding_boxes_mask(boxes)
boxes, labels, iscrowd = boxes[valid_mask], labels[valid_mask], iscrowd[valid_mask] Note: a way to get the EDIT: actually Option2 is not as pleasant as I thought because it forces users to call CC @pmeier @vfdev-5 for your input on both design discussions here (A and B). |
A: 👍 B: I like option 1 best, but I don't think JIT supports Plus, when you say "functional" here, do you mean an actual functional, i.e. Without breaking any of our current paradigms, let me propose a third option: def sanitize_bounding_boxes(boxes):
# compute subsampling mask and apply to boxes
return boxes, is_valid That way people can use the mask if needed without extra computation, and don't have to subsample their boxes themselves. If we treat this function as kernel rather than as "functional", we also don't have an issue with the tuple return. For boxes we already have kernels that return a tuple, with the first item always being the boxes and the remaining parts being metadata. |
I have an additional comment: why this check |
@MarcSzafraniec that's because in torchvision the expected format for the bounding boxes coordinates is absolute, not relative. The rest of the transforms also expect e.g. the XYXY format to be in absolute pixel coordinates. |
🚀 The feature
Motivation, pitch
SanitizeBoundingBoxes is a transform for bounding boxes that is a bit different from the others, as it doesn't take a bounding box object but a dictionary containing a bounding box object and maybe labels.
So we cannot use it in a
transforms.Compose
object as we can with other transforms, and have to apply it separately.Plus, it has no functional version, which forces us to instantiate a transforms object and apply it to said dictionary.
Finally, an important drawback of SanitizeBoundingBoxes is that it subsets the boxes and the labels together, but not other detection features such as
area
oriscrowd
. This is error prone, and this transform should at least be able to accommodate the standard COCO format.Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: