Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 570 Bytes

File metadata and controls

9 lines (5 loc) · 570 Bytes

Object Detection in Video with Spatiotemporal Sampling Networks

Architecture

Summary

Using deformable convolutions across space and time (instead of optical flow) to leverage temporal information for object detection in video, i.e., using deformable convolutions to sample relevant features from nearby frames (27 frames in total) and using temporally aggregagtion (per-pixel weighted summation) to generate final feature maps for detection network (R-FCN).