-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mp Hands nodes & segmentation msg creation #2
Conversation
Will add later today! |
ml/messages/landmarks.py
Outdated
dai.Buffer.__init__(self) | ||
self.confidence: float = 0.0 | ||
self.handdedness: float = 0.0 | ||
self.landmarks: List[dai.Point3f] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to use descriptors to validate that landmarks are of correct class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added @property
and @property.setter
to match PR #3.
self, | ||
score_threshold=0.5, | ||
handdedness_threshold=0.5, | ||
input_size=(224, 224) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem this is being used anywhere?
Also, a note here is that we might want to return normalized 0-1 values. DepthAI does the same for the bounding boxes. And there is some beauty in it, where you don't need to know the shape in advance to unnormalize.
We can then have a "visualize" method on the Parser or Message itself, or a static method in the parser, that can visualize the message onto a specific frame that is passed together. That way we can nicely avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do this on for all nodes? Right now, face detector parser node (scrfd) returns bboxes w.r.t. input size. Should I also normalize there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
def run(self): | ||
""" | ||
Postprocessing logic for SCRFD model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added better description here
print(f"Layer names = {output.getAllLayerNames()}") | ||
|
||
tensorInfo = output.getTensorInfo("Identity") | ||
landmarks = output.getTensor(f"Identity").reshape(21, 3).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth checking if dequantization can be performed by DepthAI itself. Something like
output.getTensor("name", dequantize=True)
where True
is the default value could be beneficial. Please open a new thread on Slack to request support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a thread.
ml/postprocessing/selfie_seg.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mediapipe_selfie_segmentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ml/postprocessing/selfie_seg.py
Outdated
@@ -1,24 +1,20 @@ | |||
import depthai as dai | |||
import numpy as np | |||
import cv2 | |||
from .utils.message_creation.depth_segmentation import create_depth_segmentation_msg | |||
|
|||
class SeflieSegParser(dai.node.ThreadedHostNode): | |||
def __init__( | |||
self, | |||
threshold=0.5, | |||
input_size=(256, 144), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd eliminate this as well if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, no general function for creating segmentation messages? This step will repeat in other seg. parsers as well. It is also very similar to depth message.
Then I should add message creation back to the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see that bellow is the proposal for creating specific msg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can either do a specific message if that works well for DAI, or we define such models for all types of outputs to not increase message complexities where it's not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented function to create messages here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mediapipe.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file taken from somewhere? If yes, we should cite the source and check whether license is permissive or restrictive?
It might make sense we only keep the methods that are useful for us, and if there aren't many, maybe we should rewrite if license is restrictive.
Are there some methods that are useful in other detection models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option would be we define this as a message itself? Not sure how visualizers or DAI would work, but if we extend dai.ImgFrame in DepthMessage
without any additional fields it might be fine and we wouldn't need this. We would just better initialize self
in the constructor.
In any case, I believe this would be better suited for messages utils. But in such case we would want to have create method for all types of messages, not just some, so maybe above solution avoid the need for that. Not sure if such methods would be useful for more complex messages though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented function to create messages here
def normalize_radians(angle): | ||
return angle - 2 * math.pi * math.floor((angle + math.pi) / (2 * math.pi)) | ||
|
||
def non_maxima_suppression(bboxes, iou_threshold): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed if we use cv2 NMS elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
ml/messages/landmarks.py
Outdated
dai.Buffer.__init__(self) | ||
self.confidence: float = 0.0 | ||
self.handdedness: float = 0.0 | ||
self.landmarks: List[dai.Point3f] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Landmarks are the same as keypoints, right? Let's use an uniform naming (I'd prefer keypoints as this is also used in luxonis-train)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense. Renamed it to keypoints.
|
||
def run(self): | ||
""" | ||
Postprocessing logic for SCRFD model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the model named mp_hand_landmark or SCFRD? If second, then naming of this file is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the right description to all nodes.
break # Pipeline was stopped | ||
|
||
print('MP Hand landmark node') | ||
print(f"Layer names = {output.getAllLayerNames()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these 2 print statements can be omitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all print statements.
ml/messages/landmarks.py
Outdated
dai.Buffer.__init__(self) | ||
self.confidence: float = 0.0 | ||
self.handdedness: float = 0.0 | ||
self.landmarks: List[dai.Point3f] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import cv2 | ||
from .utils.message_creation import create_segmentation_message | ||
|
||
class MPSeflieSegParser(dai.node.ThreadedHostNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a general segmentation parser? Same comment would apply for other classes.
In other words, is there anything super specific that this must be MPSelfieSeg or can it just be SegmentationParser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define it as general segmentation parser for two classes (front, background). Actually it is already like that. Or should we extend it to multiclass? Maybe better if we have multiclass seg. parser separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it differs much? So I'd make it general and multiclass.
bboxes = np.array(bboxes)[indices] | ||
scores = np.array(scores)[indices] | ||
|
||
detections = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be in a create function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function added here.
# normalize landmarks | ||
landmarks /= self.scale_factor | ||
|
||
hand_landmarks_msg = HandKeypoints() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_message function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function added here.
ml/postprocessing/utils/detection.py
Outdated
@@ -0,0 +1,11 @@ | |||
from .medipipe import generate_handtracker_anchors, decode_bboxes, detections_to_rect, rect_transformation | |||
|
|||
def generate_anchors_and_decode(bboxes, scores, threshold=0.5, scale=192): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this only contains mediapipe functions, I don't think it belongs here.
When we adapt mediapipe utils and make it more generic, we can move certain parts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here
ml/postprocessing/segmentation.py
Outdated
segmentation_mask = segmentation_mask[0] # num_clases x H x W | ||
overlay_image = np.zeros((segmentation_mask.shape[1], segmentation_mask.shape[2], 1), dtype=np.uint8) | ||
|
||
for class_id in range(self.num_classes-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt overlay_image be just an argmax over segmentation mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, argmax added here. One 'layer' is added in the first dimension with all zeros so first class has index 1.
|
||
if not isinstance(x, np.ndarray): | ||
raise ValueError(f"Expected numpy array, got {type(x)}.") | ||
if len(x.shape) != 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate there is only one channel as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added in f8a48f1.
raise ValueError(f"Expected numpy array, got {type(x)}.") | ||
if len(x.shape) != 3: | ||
raise ValueError(f"Expected 3D input, got {len(x.shape)}D input.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate that there is only one channel as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added in f8a48f1.
Segmentation msg creation
Segmentation message is created from
dai.ImgFrame
withRAW8
type because we only have a few classes. @jkbmrz Can you add this message for depth model using RAW16 type?MediaPipe Hands nodes
Two nodes for two models are added: Palm detection and Hand Landmark. To generate anchors and decode detections
mediapipe_utils.py
is added. Palm detector outputs classical detection message, Landmark model outputs customHandLandmark
message, consisting ofconfidence
,handdedness
andlandmarks
. Handdedness is just info about left or right hand. In the future when new pose detection models are added, we will maybe have more general msg containing only confidence and landmarks and then extend this class for the hands,... And the landmarks is just list ofdai.Point3f