Prepare data following MMDetection.
Obtain the json files for OV-COCO from GoogleDrive and put them
under data/coco/wusize
The data structure looks like:
checkpoints/
├── clip_vitb32.pth
├── res50_fpn_soco_star_400.pth
data/
├── coco
│ ├── annotations
│ │ ├── instances_{train,val}2017.json
│ ├── wusize
│ │ ├── instances_train2017_base.json
│ │ ├── instances_val2017_base.json
│ │ ├── instances_val2017_novel.json
│ │ ├── captions_train2017_tags_allcaps.json
│ ├── train2017
│ ├── val2017
│ ├── test2017
Otherwise, generate the json files using the following scripts
python tools/pre_processors/keep_coco_base.py \
--json_path data/coco/annotations/instances_train2017.json \
--out_path data/coco/wusize/instances_train2017_base.json
python tools/pre_processors/keep_coco_base.py \
--json_path data/coco/annotations/instances_val2017.json \
--out_path data/coco/wusize/instances_val2017_base.json
python tools/pre_processors/keep_coco_novel.py \
--json_path data/coco/annotations/instances_val2017.json \
--out_path data/coco/wusize/instances_val2017_novel.json
The json file for caption supervision captions_train2017_tags_allcaps.json
is obtained following
Detic. Put it under
data/coco/wusize
.
As the training on COCO tends to converge to base categories, we use the output of the last attention layer for classification. Generate the class embeddings by
python tools/hand_craft_prompt.py --model_version ViT-B/32 --ann data/coco/annotations/instances_val2017.json \
--out_path data/metadata/coco_clip_hand_craft.npy --dataset coco
The generated file data/metadata/coco_clip_hand_craft_attn12.npy
is used for training and testing.
The implementation based on MMDet3.x achieves better results compared to the results reported in the paper.
Backbone | Method | Supervision | Novel AP50 | Config | Download | |
---|---|---|---|---|---|---|
Paper | R-50-FPN | BARON | CLIP | 34.0 | - | - |
This Repo | R-50-FPN | BARON | CLIP | 34.6 | config | model | log |
Paper | R-50-C4 | BARON | COCO Caption | 33.1 | - | - |
This Repo | R-50-C4 | BARON | COCO Caption | 35.1 | config | model | log |
This Repo | R-50-C4 | BARON | CLIP | 34.0 | config | model | log |
To test the models, run
GPUS=8 GPUS_PER_NODE=8 CPUS_PER_TASK=12 bash tools/slurm_test.sh PARTITION test \
path/to/the/cfg/file path/to/the/checkpoint
Train the detector based on FasterRCNN+ResNet50+FPN with SyncBN and SOCO pre-trained model. Obtain the SOCO pre-trained
model from GoogleDrive and put it
under checkpoints
.
GPUS=16 GPUS_PER_NODE=8 CPUS_PER_TASK=12 bash tools/slurm_train.sh PARTITION train \
configs/baron/ov_coco/baron_kd_faster_rcnn_r50_fpn_syncbn_90kx2.py \
path/to/save/logs/and/checkpoints
We can also train a detector based on FasterRCNN+ResNet50C4.
GPUS=8 GPUS_PER_NODE=8 CPUS_PER_TASK=12 bash tools/slurm_train.sh PARTITION train \
configs/baron/ov_coco/baron_kd_faster_rcnn_r50_c4_90k.py \
path/to/save/logs/and/checkpoints
Train the detector based on FasterRCNN+ResNet50C4.
GPUS=8 GPUS_PER_NODE=8 CPUS_PER_TASK=12 bash tools/slurm_train.sh PARTITION train \
configs/baron/ov_coco/baron_caption_faster_rcnn_r50_caffe_c4_90k.py \
path/to/save/logs/and/checkpoints