Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mp Hands nodes & segmentation msg creation #2

Merged
merged 30 commits into from
Jun 6, 2024
Merged

Mp Hands nodes & segmentation msg creation #2

merged 30 commits into from
Jun 6, 2024

Conversation

kkeroo
Copy link
Collaborator

@kkeroo kkeroo commented May 28, 2024

Segmentation msg creation

Segmentation message is created from dai.ImgFrame with RAW8 type because we only have a few classes. @jkbmrz Can you add this message for depth model using RAW16 type?

MediaPipe Hands nodes

Two nodes for two models are added: Palm detection and Hand Landmark. To generate anchors and decode detections mediapipe_utils.py is added. Palm detector outputs classical detection message, Landmark model outputs custom HandLandmark message, consisting of confidence, handdedness and landmarks. Handdedness is just info about left or right hand. In the future when new pose detection models are added, we will maybe have more general msg containing only confidence and landmarks and then extend this class for the hands,... And the landmarks is just list of dai.Point3f

@kkeroo kkeroo requested a review from jkbmrz May 28, 2024 08:00
@jkbmrz
Copy link
Collaborator

jkbmrz commented May 28, 2024

Can you add this message for depth model using RAW16 type?

Will add later today!

dai.Buffer.__init__(self)
self.confidence: float = 0.0
self.handdedness: float = 0.0
self.landmarks: List[dai.Point3f] = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to use descriptors to validate that landmarks are of correct class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkeroo I've just added a PR (#3) with descriptors, if that comes handy!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @property and @property.setter to match PR #3.

self,
score_threshold=0.5,
handdedness_threshold=0.5,
input_size=(224, 224)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem this is being used anywhere?

Also, a note here is that we might want to return normalized 0-1 values. DepthAI does the same for the bounding boxes. And there is some beauty in it, where you don't need to know the shape in advance to unnormalize.

We can then have a "visualize" method on the Parser or Message itself, or a static method in the parser, that can visualize the message onto a specific frame that is passed together. That way we can nicely avoid this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do this on for all nodes? Right now, face detector parser node (scrfd) returns bboxes w.r.t. input size. Should I also normalize there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def run(self):
"""
Postprocessing logic for SCRFD model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be improved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added better description here

print(f"Layer names = {output.getAllLayerNames()}")

tensorInfo = output.getTensorInfo("Identity")
landmarks = output.getTensor(f"Identity").reshape(21, 3).astype(np.float32)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth checking if dequantization can be performed by DepthAI itself. Something like
output.getTensor("name", dequantize=True) where True is the default value could be beneficial. Please open a new thread on Slack to request support.

Copy link
Collaborator Author

@kkeroo kkeroo May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mediapipe_selfie_segmentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1,24 +1,20 @@
import depthai as dai
import numpy as np
import cv2
from .utils.message_creation.depth_segmentation import create_depth_segmentation_msg

class SeflieSegParser(dai.node.ThreadedHostNode):
def __init__(
self,
threshold=0.5,
input_size=(256, 144),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd eliminate this as well if we can.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, no general function for creating segmentation messages? This step will repeat in other seg. parsers as well. It is also very similar to depth message.

Then I should add message creation back to the parser?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that bellow is the proposal for creating specific msg.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can either do a specific message if that works well for DAI, or we define such models for all types of outputs to not increase message complexities where it's not needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented function to create messages here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mediapipe.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file taken from somewhere? If yes, we should cite the source and check whether license is permissive or restrictive?

It might make sense we only keep the methods that are useful for us, and if there aren't many, maybe we should rewrite if license is restrictive.

Are there some methods that are useful in other detection models?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be we define this as a message itself? Not sure how visualizers or DAI would work, but if we extend dai.ImgFrame in DepthMessage without any additional fields it might be fine and we wouldn't need this. We would just better initialize self in the constructor.

In any case, I believe this would be better suited for messages utils. But in such case we would want to have create method for all types of messages, not just some, so maybe above solution avoid the need for that. Not sure if such methods would be useful for more complex messages though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented function to create messages here

def normalize_radians(angle):
return angle - 2 * math.pi * math.floor((angle + math.pi) / (2 * math.pi))

def non_maxima_suppression(bboxes, iou_threshold):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed if we use cv2 NMS elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

dai.Buffer.__init__(self)
self.confidence: float = 0.0
self.handdedness: float = 0.0
self.landmarks: List[dai.Point3f] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landmarks are the same as keypoints, right? Let's use an uniform naming (I'd prefer keypoints as this is also used in luxonis-train)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. Renamed it to keypoints.


def run(self):
"""
Postprocessing logic for SCRFD model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the model named mp_hand_landmark or SCFRD? If second, then naming of this file is incorrect

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the right description to all nodes.

break # Pipeline was stopped

print('MP Hand landmark node')
print(f"Layer names = {output.getAllLayerNames()}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these 2 print statements can be omitted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all print statements.

dai.Buffer.__init__(self)
self.confidence: float = 0.0
self.handdedness: float = 0.0
self.landmarks: List[dai.Point3f] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkeroo I've just added a PR (#3) with descriptors, if that comes handy!

import cv2
from .utils.message_creation import create_segmentation_message

class MPSeflieSegParser(dai.node.ThreadedHostNode):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a general segmentation parser? Same comment would apply for other classes.
In other words, is there anything super specific that this must be MPSelfieSeg or can it just be SegmentationParser?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define it as general segmentation parser for two classes (front, background). Actually it is already like that. Or should we extend it to multiclass? Maybe better if we have multiclass seg. parser separate?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it differs much? So I'd make it general and multiclass.

@kkeroo kkeroo requested a review from tersekmatija May 31, 2024 10:42
bboxes = np.array(bboxes)[indices]
scores = np.array(scores)[indices]

detections = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in a create function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function added here.

# normalize landmarks
landmarks /= self.scale_factor

hand_landmarks_msg = HandKeypoints()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_message function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function added here.

@@ -0,0 +1,11 @@
from .medipipe import generate_handtracker_anchors, decode_bboxes, detections_to_rect, rect_transformation

def generate_anchors_and_decode(bboxes, scores, threshold=0.5, scale=192):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this only contains mediapipe functions, I don't think it belongs here.

When we adapt mediapipe utils and make it more generic, we can move certain parts here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here

segmentation_mask = segmentation_mask[0] # num_clases x H x W
overlay_image = np.zeros((segmentation_mask.shape[1], segmentation_mask.shape[2], 1), dtype=np.uint8)

for class_id in range(self.num_classes-1):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt overlay_image be just an argmax over segmentation mask?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, argmax added here. One 'layer' is added in the first dimension with all zeros so first class has index 1.

@kkeroo kkeroo requested a review from tersekmatija June 3, 2024 17:11

if not isinstance(x, np.ndarray):
raise ValueError(f"Expected numpy array, got {type(x)}.")
if len(x.shape) != 3:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate there is only one channel as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added in f8a48f1.

raise ValueError(f"Expected numpy array, got {type(x)}.")
if len(x.shape) != 3:
raise ValueError(f"Expected 3D input, got {len(x.shape)}D input.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that there is only one channel as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added in f8a48f1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants