We propose to train a recognizer that can classify images and videos. The recognizer is jointly trained on image and video datasets. Compared with pre-training on the same image dataset, this method can significantly improve the video recognition performance.
frame sampling strategy | scheduler | resolution | gpus | backbone | joint-training | top1 acc | top5 acc | testing protocol | FLOPs | params | config | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8x8x1 | Linear+Cosine | 224x224 | 8 | ResNet50 | ImageNet | 77.30 | 93.23 | 10 clips x 3 crop | 54.75G | 32.45M | config | ckpt | log |
- The gpus indicates the number of gpus we used to get the checkpoint. If you want to use a different number of gpus or videos per gpu, the best way is to set
--auto-scale-lr
when callingtools/train.py
, this parameter will auto-scale the learning rate according to the actual batch size and the original batch size. - The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at Kinetics400-Validation. The corresponding data list (each line is of the format 'video_id, num_frames, label_index') and the label map are also available.
For more details on data preparation, you can refer to Kinetics400.
You can use the following command to train a model.
python tools/train.py ${CONFIG_FILE} [optional arguments]
Example: train SlowOnly model on Kinetics-400 dataset in a deterministic option with periodic validation.
python tools/train.py configs/recognition/omnisource/slowonly_r50_8xb16-8x8x1-256e_imagenet-kinetics400-rgb.py \
--seed=0 --deterministic
We found that the training of this Omnisource model could crash for unknown reasons. If this happens, you can resume training by adding the --cfg-options resume=True
to the training script.
For more details, you can refer to the Training part in the Training and Test Tutorial.
You can use the following command to test a model.
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
Example: test SlowOnly model on Kinetics-400 dataset and dump the result to a pkl file.
python tools/test.py configs/recognition/omnisource/slowonly_r50_8xb16-8x8x1-256e_imagenet-kinetics400-rgb.py \
checkpoints/SOME_CHECKPOINT.pth --dump result.pkl
For more details, you can refer to the Test part in the Training and Test Tutorial.
@inproceedings{feichtenhofer2019slowfast,
title={Slowfast networks for video recognition},
author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={6202--6211},
year={2019}
}
@article{duan2020omni,
title={Omni-sourced Webly-supervised Learning for Video Recognition},
author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
journal={arXiv preprint arXiv:2003.13042},
year={2020}
}