[CVPRW 2024] Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
This repository contains the source code for AI City Challenge 2024 Track 3 (Naturalistic Driving Action Recognition).
- Team Name: SKKU-AutoLab.
- Team ID: 05.
conda env create --name track3 --file=environment.yml
conda activate track3
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
conda create --name track3 python=3.10.13
conda activate track3
pip install -r requirements.txt
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install detectron2-0.6-cp310-cp310-linux_x86_64.whl
sudo docker load < docker_aic24_track3_final.tar
sudo docker run --ipc=host --gpus all -v <LOCAL_INPUT_DATA>:/usr/src/aic24-track_3/B/ \
-v <LOCAL_OUTPUT_FOLDER>:/usr/src/aic24-track_3/output_submission/ \
-it <IMAGE_ID>
bash run_infer_all.sh
Example: sudo docker run --ipc=host --gpus all -v /home/vsw/Downloads/B/:/usr/src/aic24-track_3/B/ \
-v /home/vsw/Downloads/output_submission/:/usr/src/aic24-track_3/output_submission/ \
-it 96f8bfc76877
To get cut videos for training X3D, UniformerV1_1, and VideoMAE, please download it from this link. After downloading, extract the file and put it to three folders X3D_train/data, VideoMAE_train/data/A1_clip (only put sub folders in the A1_clip folder), and UniformerV2_1_train/data.
To get custom cut videos for training UniformerV2_2, please download it from this link. After downloading, extract the file and put it to folder UniformerV2_2_train/data.
To get pretrained weights for UniformerV2_1 and UniformerV2_2, please download it from this link and this link. After downloading, extract the file and put it to two folders UniformerV2_1_train and UniformerV2_2_train.
To get pretrained weights for VideoMAE, please download it from this link. After downloading, extract the file and put it to the folder VideoMAE_train.
To get docker file to make an inference on a custom dataset, please download it from this link.
To get X3D weights, please download them from this link. After downloading, extract the file and put it to the folder X3D_train.
To get UniformerV2_1 weights, please download them from this link. After downloading, extract the file and put it to the folder UniformerV2_1_train.
To get UniformerV2_2 weights, please download them from this link. After downloading, extract the file and put it to the folder UniformerV2_2_train.
To get VideoMAE weights, please download them from this link. After downloading, extract the file and put it to the folder VideoMAE_train.
For X3D model, the dataset is organized with the following structure:
|_ data
| |_ A1_clip
| | |_ 0
| | | |_ *.mp4
| | |_ 1
| | | |_ *.mp4
| | |_ ...
| | | |_ *.mp4
| | |_ 15
| | | |_ *.mp4
| |_ *.csv
|_ pickle_x3d
| |_ A2
| | |_ *.pkl
|_ checkpoint_x3d
| |_ *.pyth
For UniformerV2_1 model, the dataset is organized with the following structure:
|_ A2
| |_ user_id_12670
| | |_ *.mp4
| |_ user_id_13148
| | |_ *.mp4
| |_ ...
| | |_ *.mp4
| |_ user_id_96715
| | |_ *.mp4
|_ data
| |_ A1_clip
| | |_ 0
| | | |_ *.mp4
| | |_ 1
| | | |_ *.mp4
| | |_ ...
| | | |_ *.mp4
| | |_ 15
| | | |_ *.mp4
|_ pickle_uniformerv2_full
| |_ *.pkl
|_ checkpoint_uniformerv2_full
| |_ *.pyth
|_ k710_uniformerv2_l14_8x336.pyth
|_ vit_saved
| | |_ vit_b16.pth
| | |_ vit_l14.pth
| | |_ vit_l14_336.pth
For UniformerV2_2 model, the dataset is organized with the following structure:
|_ A2
| |_ user_id_12670
| | |_ *.mp4
| |_ user_id_13148
| | |_ *.mp4
| |_ ...
| | |_ *.mp4
| |_ user_id_96715
| | |_ *.mp4
|_ data
| |_ A1_clip_custom
| | |_ 0
| | | |_ *.mp4
| | |_ 1
| | | |_ *.mp4
| | |_ 2
| | | |_ *.mp4
| | |_ 3
| | | |_ *.mp4
|_ pickle_uniformerv2_4lcs
| |_ *.pkl
|_ checkpoint_uniformerv2_4cls
| |_ *.pyth
|_ k710_uniformerv2_l14_8x336.pyth
|_ vit_saved
| | |_ vit_b16.pth
| | |_ vit_l14.pth
| | |_ vit_l14_336.pth
For VideoMAE model, the dataset is organized with the following structure:
|_ data
| |_ A1_clip
| | |_ 0
| | | |_ *.mp4
| | |_ 1
| | | |_ *.mp4
| | |_ ...
| | | |_ *.mp4
| | |_ 15
| | | |_ *.mp4
| | |_ *.csv
|_ pretrained_models
| |_ vit_l_hybrid_pt_800e_k700_ft.pth
To train X3D, follow the code snippets bellow:
cd X3D_train
# Step 1: Train X3D
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..
To train UniformerV2_1, follow the code snippets bellow:
cd UniformerV2_1_train
# Step 1: Train UniformerV2_1
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..
To train UniformerV2_2, follow the code snippets bellow:
cd UniformerV2_2_train
# Step 1: Train UniformerV2_2
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..
To train VideoMAE, follow the code snippets bellow:
cd VideoMAE_train
# Step 1: Train VideoMAE
bash scripts/cls/train_fold0.sh
bash scripts/cls/train_fold1.sh
bash scripts/cls/train_fold2.sh
bash scripts/cls/train_fold3.sh
bash scripts/cls/train_fold4.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..
To ensemble four models, run the following scripts:
bash scripts/run_infer_x3d.sh
bash scripts/run_infer_uniformerv2_1.sh
bash scripts/run_infer_uniformerv2_2.sh
bash scripts/run_infer_videomae.sh
bash scripts/run_infer_all.sh # copy all checkpoints to the infer folder and create the submission file
If you find our work useful, please cite the following:
title={Multi-view spatial-temporal learning for understanding unusual behaviors in untrimmed naturalistic driving videos},
author={Nguyen, Huy-Hung and Tran, Chi Dai and Pham, Long Hoang and Tran, Duong Nguyen-Ngoc and Tran, Tai Huu-Phuong and Vu, Duong Khac and Ho, Quoc Pham-Nam and Huynh, Ngoc Doan-Minh and Jeon, Hyung-Min and Jeon, Hyung-Joon and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
If you have any questions, feel free to contact Huy H. Nguyen
(huyhung411991@gmail.com), Chi D. Tran
(ctran743@gmail.com) or Automation Lab
Our framework is built using multiple open source, thanks for their great contributions.