Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New model parsers. #5

Merged
merged 14 commits into from
Jun 28, 2024
Merged

New model parsers. #5

merged 14 commits into from
Jun 28, 2024

Conversation

kkeroo
Copy link
Collaborator

@kkeroo kkeroo commented Jun 21, 2024

New host nodes

  1. Xfeat model: to get the matching points, 2 images are needed, so on first iteration the results are stored and then next frame is matched with the initial frame. Not sure how we want to define reference and target frames in the entire pipeline? So, the parser assumes that first frame that comes to the NN is the reference frame. In practice, the first frame is probably not the best idea and we should set the reference frame over some set function?
  2. SuperAnimal landmark model
  3. M-LSD line detector: some more linking is used. parser node accepts two inputs: one is the output from the NN and the other is the passthrough from the NN. The latter is needed for postprocessing the results.
  4. Mediapipe Face mesh model

Messages and utils

  1. XFeat - MatchedPoints custom message and function for creating the message added. Util functions for decoding Xfeat results are added in separate file.
  2. SuperAnimal - creating message with generic Keypoints message class. Also, util functions for decoding the results.
  3. FaceMesh - using the same create message function as SuperAnimal (making it generic)
  4. M-LSD - Lines and Line custom messages and function for creating the message added. Also, util functions for decoding the results.

Additionally, some parts are adapted to the #4 from @jkbmrz.

@kkeroo kkeroo requested review from jkbmrz and tersekmatija and removed request for jkbmrz and tersekmatija June 21, 2024 13:38
(0, 1, 3, 2, 4)).reshape(B, 1, H * 8, W * 8)
return heatmap

def _nms(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use NMS many times in this repo. Should we write a single general NMS function that would be used throughout the library?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be great. Maybe address this in a new PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We can make some parsing PR where we fix such things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the above bilinear grid sample is generic enough to be in some common utils rather than specific to xfeat. We should try and make them as generic as possible, so they can be easily re-used by other nodes where it makes sense. Let's address those in a separate PR.

@@ -113,3 +115,51 @@ def create_detection_message(
detections_msg = img_detections()
detections_msg.detections = detections
return detections_msg

def create_line_detection_message(lines: np.ndarray, scores: np.ndarray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we merge this with the more general create_detection_message function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I dont think we should. First, meaning of the two functions is not the same, e.g. create_detection_function is used for creating messages with bboxes and create_line_detection_message is used for creating messages with lines. Second, functions use different message types, bbox detections have xmin, xmax values but Line message does not really have min and max coordinates as line can be pointed in any direction. Third, validation will be more complex.

Overall, I think we would only complicate things to much.

@jkbmrz thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see. At first thought it made sense to merge because create_detection_message already includes various types of "detections" and not all of them are obligatory (e.g. keypoints). But as there is indeed no situation where lines would be detected together with bboxes so let's keep them separated. I'd maybe move the create_line_detection_message into a new file named as lines.py to avoid the ambiguity (it has nothing to do with object detection). Moreover, to avoid further ambiguity, we can think of renaming detection.py (e.g. to object_detection.py or objects.py) and rename it's message to create_object_detection_message (but we can do this in another PR).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to lines.py I agree. We can leave detection.py as is given it's typically associated with object detection.

@tersekmatija
Copy link

Let's first rebase based on #4 , then we can go ahead and review as it seems there are duplicate files.

@kkeroo kkeroo force-pushed the feature/add_models branch from 36701a7 to 5e54ee4 Compare June 24, 2024 12:25
@jkbmrz
Copy link
Collaborator

jkbmrz commented Jun 24, 2024

Let's first rebase based on #4 , then we can go ahead and review as it seems there are duplicate files.

#4 is merged.

@kkeroo kkeroo force-pushed the feature/add_models branch from 5e54ee4 to 6e7a2c4 Compare June 24, 2024 13:23
Copy link

@tersekmatija tersekmatija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decent base, requested some minor modifications. We can merge some utils functions in later PRs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could instead do something like: https://docs.luxonis.com/software/depthai-components/nodes/feature_tracker.

We would likely be matching features across the time, so perhaps using MatchedFeatures from above would make the most sense?

Alternative option is to provide 2 possible decoders, one for feature matching and one for feature tracking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, meaning that I add TrackedFeature in the list of reference_points and target_points inside MatchedPoints or full support with TrackedFeatures msg consisting of TrackedFeature? If the latter, how can I "connect" (match) two features? By setting the same id and increasing age?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you would need to set the same ID to the new point and increment the age. I would do a full support of TrackedFeatures so that DAI can use this directly or see how it can be used downstream.

Let's clarify with DAI team internally if we have some further questions on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in e50e75a.


from ..messages.creators import create_keypoints_message

class MPFaceLandmarkerParser(dai.node.ThreadedHostNode):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be more generic? Something like a KeypointParser or something along those lines indicating that it's a Parser for keypoints output on a single image?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it can be generic enough so we can for example join it with SuperAnimal Landmarker?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General keypoint parser added in 2eefa6e.

Note: face_landmark blob returns [1,1,1,1404] so simple calcuation is performed to get the num. of coords. (2 or 3).

I think SuperAnimal cannot be merged because it is pruned network and requires some additional postprocessing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: face_landmark blob returns [1,1,1,1404] so simple calcuation is performed to get the num. of coords. (2 or 3).

I would assume that at times we could have outputs of [B, N_keypoints, D_keypoints] where N is the number and D dimension. Potentially, we have support for this and option for reshape like in case of face_landmark output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We only assume that the batch size is 1.

(0, 1, 3, 2, 4)).reshape(B, 1, H * 8, W * 8)
return heatmap

def _nms(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the above bilinear grid sample is generic enough to be in some common utils rather than specific to xfeat. We should try and make them as generic as possible, so they can be easily re-used by other nodes where it makes sense. Let's address those in a separate PR.

@kkeroo kkeroo self-assigned this Jun 24, 2024
@kkeroo kkeroo requested a review from tersekmatija June 26, 2024 14:02
@kkeroo kkeroo requested a review from jkbmrz June 26, 2024 16:06
Copy link

@tersekmatija tersekmatija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes request but fine to merge afterwards.

Keypoints: Message containing 2D or 3D coordinates of the detected keypoints.
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note is that his might require reshaping depending on the output as mentioned in one of the comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reshape is handled in the parser.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if this could be somehow fused with likes of YuNet or Keypoints instead.

Effectively, this will output keypoints + two scores for the image. We can explore this later as well potentially, but I think worth bringing up. Thoughts?

Copy link
Collaborator Author

@kkeroo kkeroo Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea probably with Keypoints as its closer in meaning. Maybe we can extend Keypoints so that it expects more than one output tensor. We would assume that the first tensor will represent keypoints, and other additional tensors will represent some scores? Other keypoint models will maybe also have score that the object is present in the picture or something.

However not sure if this is the general case with all keypoint models (to have additional scalar outputs).

@kkeroo kkeroo merged commit 0c0edd6 into main Jun 28, 2024
@kkeroo kkeroo deleted the feature/add_models branch June 28, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants