Visdom

Monitoring of the training/evaluation progress is possible via command line as well as Visdom. For the latter, a Visdom server must be running at VISDOM_PORT=8090 and VISDOM_SERVER=http://localhost. To deactivate Visdom logging set VISDOM_ON=False.

Train

We provide configurations files under configs/deformable_mask_head/ and configs/devis/ to train the Mask-Head and DeVIS respectively. In order to launch a training you just need to simply specify the number of GPUS using --nproc_per_node and the corresponding config file after --config-file. For instance, the command for training YT-VIS 2019 model with 4GPUs is as following:

torchrun --nproc_per_node=4 main.py --config-file configs/devis/YT-19/devis_R_50_YT-19.yaml

User can also override config file parameters by passing the new KEY VALUE pair. For instance, to double the default lr:

torchrun --nproc_per_node=4 main.py --config-file configs/devis/YT-19/devis_R_50_YT-19.yaml SOLVER.BASE_LR 0.0002

Model zoo

Dataset	Backbone	AP	Total batch size	Training GPU hours *	Max GPU memory	URL
COCO	R50	38.0	14	345	27GB	config model
COCO	R101	39.9	14	260	32GB	config model
COCO	SwinL	45.2	7	470	26GB	config model
YouTube-VIS 19	R50	44.4	4	120	18GB	config log model
YouTube-VIS 19	SwinL	57.1	4	220	37GB	config log model
YouTube-VIS 21	R50	43.1	4	200	24GB	config log model
YouTube-VIS 21**	SwinL	54.4	4	305	40GB	config log model
OVIS	R50	23.7	4	145	24GB	config log model
OVIS	SwinL	35.5	4	204	38GB	config log model

** We have used the following train set in order to fit GPU memory, which removes the 2 most crowded train videos.

Ablations

We also provide configuration file to run all the ablation studies presented on Table 1:

Method	Clip size	K_temp	Features scales	AP	Training GPU hours*	Max GPU memory	URL
Deformable VisTR	36	4	1	34.2	190	10GB	config
Deformable VisTR	36	0	1	35.3	150	7GB	config
Deformable VisTR	6	0	1	32.4	40	2GB	config
DeVIS	6	4	1	34.0	46	3GB	config
+increase spatial inputs	6	4	4	35.9	104	15GB	config
+instance aware obj. queries	6	4	4	37.0	115	15GB	config
+multi-scale mask head	6	4	4	40.2	128	16GB	config
+multi-cue clip tracking	6	4	4	41.9	---	--	config
+aux. loss weighting	6	4	4	44.0	128	16GB	config

*Training GPU hours measured on a RTX A6000 GPU

Validation during training

We support evaluation during training for VIS datasets despite GT annotations not available. Results will be saved into TEST.SAVE_PATH folder, created inside OUTPUT_DIR. Users can set EVAL_PERIOD to select the interval between validations (0 to disable it) Additionally, START_EVAL_EPOCH allows selecting at which epoch start considering EVAL_PERIOD, useful in order to omit first epochs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRAIN.md

TRAIN.md

Visdom

Train

Model zoo

Ablations

Validation during training

Files

TRAIN.md

Latest commit

History

TRAIN.md

File metadata and controls

Visdom

Train

Model zoo

Ablations

Validation during training