Skip to content
/ DIRV Public

Code for "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021)

Notifications You must be signed in to change notification settings

MVIG-SJTU/DIRV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b704f26 · Mar 14, 2021

History

38 Commits
Jan 15, 2021
Feb 21, 2021
Jan 15, 2021
Jan 15, 2021
Feb 2, 2021
Feb 2, 2021
Jan 15, 2021
Feb 21, 2021
Mar 14, 2021
Jan 15, 2021
Jan 15, 2021
Jan 22, 2021
Feb 2, 2021
Feb 21, 2021
Feb 21, 2021
Feb 21, 2021

Repository files navigation

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Official code implementation for the paper "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021) paper.

The code is developed based on the architecture of zylo117/Yet-Another-EfficientDet-Pytorch. We also follow some data pre-processing and model evaluation methods in BigRedT/no_frills_hoi_det and vt-vl-lab/iCAN. We sincerely thank the authors for the excellent work.

Checklist

  • Training and Test for V-COCO dataset
  • Training and Test for HICO-DET dataset
  • Demonstration on images
  • Demonstration on videos
  • More efficient voting strategy for inference using GPU

Prerequisites

The code was tested with python 3.6, pytorch 1.5.1, torchvision 0.6.1, CUDA 10.2, and Ubuntu 18.04.

Installation

  1. Clone this repository:

    git clone https://github.com/MVIG-SJTU/DIRV.git
    
  2. Install pytorch and torchvision:

    pip install torch==1.5.1 torchvision==0.6.1
    
  3. Install other necessary packages:

    pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors
    

Data Preparation

V-COCO Dataset:

Download V-COCO dataset following the official instructions.

You can find the files new_prior_mask.pkl here. Each element inside it refers to the prior probability that a verb (e.g. eat) is associated with an object category (e.g. apple). You should also download the combined training and valdataion sets annotations instances_trainval2014.json here, and put it in datasets/vcoco/coco/annotations.

HICO-DET Dataset:

Download HICO-DET dataset from the official website.

We transform the annotations of HICO-DET dataset to JSON format following BigRedT/no_frills_hoi_det. You can directly download the processed annotations from here.

We count the training sample number of each category in hico_processed/hico-det_verb_count.json. It serves as a weight when calculating loss.

Dataset Structure:

Make sure to put the files in the following structure:

|-- datasets
|   |-- vcoco
|	|	|-- data
|	|	|	|-- splits
|	|	|	|-- vcoco
|	|	|
|	|	|-- coco
|	| 	|	|-- images
|	|	|	|-- annotations
|	|	|-- new_prior_mask.pkl   
|   |-- hico_20160224_det
|	|	|-- images
|	|	|-- hico_processed

Demonstration

Demonstration on Images

CUDA_VISIBLE_DEVICES=0 python demo.py --image_path /path/to/a/single/image

Demonstration on Videos

Coming soon.

Pre-trained Weights

You can download the pre-trained weights for V-COCO dataset (vcoco_best.pth) and HICO-DET dataset (hico-det_best.pth) here.

Training

Download the pre-trained weight of our backbone (efficientdet-d3_vcoco.pth and efficientdet-d3_hico-det.pth) here, and save it in weights/ directory.

Training on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py -p vcoco --batch_size 32 --load_weights weights/efficientdet-d3_vcoco.pth

Training on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -p hico-det --batch_size 48 --load_weights weights/efficientdet-d3_hico-det.pth

You may also adjust the saving directory and GPU number in projects/vcoco.yaml and projects/hico-det.yaml or create your own projects in projects/.

Test

Test on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0 python test_vcoco.py -w $path to the checkpoint$

Test on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0 python test_hico-det.py -w $path to the checkpoint$

Then please follow the same procedures in vt-vl-lab/iCAN to evaluate the result on HICO-DET dataset.

Citation

If you found our paper or code useful for your research, please cite the following paper:

@inproceedings{fang2020dirv,
      title={DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection}, 
      author={Fang, Hao-Shu and Xie, Yichen and Shao, Dian and Lu, Cewu},
      year={2021},
      booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}
}

About

Code for "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages