Object Detection in Video with Spatiotemporal Sampling Networks

Architecture

Summary

Using deformable convolutions across space and time (instead of optical flow) to leverage temporal information for object detection in video, i.e., using deformable convolutions to sample relevant features from nearby frames (27 frames in total) and using temporally aggregagtion (per-pixel weighted summation) to generate final feature maps for detection network (R-FCN).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STSN.md

STSN.md

Object Detection in Video with Spatiotemporal Sampling Networks

Architecture

Summary

Files

STSN.md

Latest commit

History

STSN.md

File metadata and controls

Object Detection in Video with Spatiotemporal Sampling Networks

Architecture

Summary