Simple Question about object detection code #11

AlexCo1d · 2025-01-05T17:19:04Z

Lines 56 to 71 in cbc9fa7

    
           def forward(self, video_o): 
        
               bsize, numc, numf, numr, fdim =  video_o.shape 
        
               video_o = video_o.view(bsize, numc*numf, numr, fdim) 
        
               roi_feat = video_o[:,:,:, :self.dim_feat] 
        
               roi_bbox = video_o[:,:,:, self.dim_feat:(self.dim_feat+self.dim_bbox)] 
        
               bbox_pos = self.bbox_conv(roi_bbox.permute( 
        
                   0, 3, 1, 2)).permute(0, 2, 3, 1) 
        
               bbox_features = torch.cat([roi_feat, bbox_pos], dim=-1) 
        
               bbox_feat = self.tohid(bbox_features) 
        
               return bbox_feat

Hi, I am reading your code about object detection. I found the above one in your EncoderVid.py
Do you still remember why you choose 5 dimension (dim_bbox) for positional embedding? What is the source of this way? (Faster RCNN or Detectron)

Thank you for your prompt response! Thanks for your great work!

doc-doc · 2025-02-23T09:23:20Z

Hi, the fifth dimension denotes the relative bbox size: bbox_size/image_size(w*h), it is basically based on my previous relation grounding work :https://github.com/doc-doc/vRGV.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple Question about object detection code #11

Simple Question about object detection code #11

AlexCo1d commented Jan 5, 2025

doc-doc commented Feb 23, 2025

Simple Question about object detection code #11

Simple Question about object detection code #11

Comments

AlexCo1d commented Jan 5, 2025

doc-doc commented Feb 23, 2025