Skip to content

Latest commit

 

History

History
58 lines (48 loc) · 9.09 KB

TRAIN.md

File metadata and controls

58 lines (48 loc) · 9.09 KB

Visdom

Monitoring of the training/evaluation progress is possible via command line as well as Visdom. For the latter, a Visdom server must be running at VISDOM_PORT=8090 and VISDOM_SERVER=http://localhost. To deactivate Visdom logging set VISDOM_ON=False.

Train

We provide configurations files under configs/deformable_mask_head/ and configs/devis/ to train the Mask-Head and DeVIS respectively. In order to launch a training you just need to simply specify the number of GPUS using --nproc_per_node and the corresponding config file after --config-file. For instance, the command for training YT-VIS 2019 model with 4GPUs is as following:

torchrun --nproc_per_node=4 main.py --config-file configs/devis/YT-19/devis_R_50_YT-19.yaml

User can also override config file parameters by passing the new KEY VALUE pair. For instance, to double the default lr:

torchrun --nproc_per_node=4 main.py --config-file configs/devis/YT-19/devis_R_50_YT-19.yaml SOLVER.BASE_LR 0.0002

Model zoo

Dataset Backbone AP Total
batch size
Training
GPU hours *
Max GPU
memory
URL
COCO R50 38.0 14 345 27GB config
model
COCO R101 39.9 14 260 32GB config
model
COCO SwinL 45.2 7 470 26GB config
model
YouTube-VIS 19 R50 44.4 4 120 18GB config
log
model
YouTube-VIS 19 SwinL 57.1 4 220 37GB config
log
model
YouTube-VIS 21 R50 43.1 4 200 24GB config
log
model
YouTube-VIS 21** SwinL 54.4 4 305 40GB config
log
model
OVIS R50 23.7 4 145 24GB config
log
model
OVIS SwinL 35.5 4 204 38GB config
log
model

** We have used the following train set in order to fit GPU memory, which removes the 2 most crowded train videos.

Ablations

We also provide configuration file to run all the ablation studies presented on Table 1:

Method Clip size K_temp Features
scales
AP Training
GPU hours*
Max GPU
memory
URL
Deformable VisTR 36 4 1 34.2 190 10GB config
Deformable VisTR 36 0 1 35.3 150 7GB config
Deformable VisTR 6 0 1 32.4 40 2GB config
DeVIS 6 4 1 34.0 46 3GB config
+increase spatial inputs 6 4 4 35.9 104 15GB config
+instance aware obj. queries 6 4 4 37.0 115 15GB config
+multi-scale mask head 6 4 4 40.2 128 16GB config
+multi-cue clip tracking 6 4 4 41.9 --- -- config
+aux. loss weighting 6 4 4 44.0 128 16GB config

*Training GPU hours measured on a RTX A6000 GPU

Validation during training

We support evaluation during training for VIS datasets despite GT annotations not available. Results will be saved into TEST.SAVE_PATH folder, created inside OUTPUT_DIR. Users can set EVAL_PERIOD to select the interval between validations (0 to disable it) Additionally, START_EVAL_EPOCH allows selecting at which epoch start considering EVAL_PERIOD, useful in order to omit first epochs.