Skip to content

katieluo88/DRIFT

Repository files navigation

DRIFT: Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

This is the official code release for [NeurIPS 2023] Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery.

by Katie Z Luo*, Zhenzhen Liu*, Xiangyu Chen*, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, and Kilian Q. Weinberger

Paper | Video

Interested in perception with about 3D object discovery? Also see MODEST.

Figure

Abstract

Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.

Table of Contents

Main Results

We provide necessary checkpoints for our experiment, including the baseline model trained on MODEST seed labels (before applying DRIFT) and models finetuned with DRIFT on Lyft and Ithaca365.

Lyft Experiment Models

Below we provide results, configs, and checkpoints on the Lyft dataset.

mAP Model Config
Baseline 23.9 ckpt cfg
DRIFT (60ep) 26.7 ckpt --
DRIFT (120ep) 29.6 ckpt cfg

Ithaca365 Experiment Models

Below we provide results, configs, and checkpoints on the Ithaca365 dataset.

mAP Model Config
Baseline 7.7 ckpt cfg
DRIFT (15ep) 28.0 ckpt --
DRIFT (30ep) 35.1 ckpt cfg

Installation

Setup with Anaconda environment:

conda create --name drift python=3.8
conda activate drift
conda install pytorch=1.9.0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
pip install opencv-python matplotlib ray wandb scipy tqdm easydict scikit-learn pillow==8.3.2

Install some necessary dependencies, and build the project. Rote-DA is built off of the OpenPCDet framework:

# install openpcdet
cd ../../../downstream/OpenPCDet
pip install -r requirements.txt
python setup.py develop

# for managing experiments
pip install hydra-core --upgrade
pip install hydra_colorlog --upgrade
pip install rich

Install Minkowski Engine:

# ME
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
git checkout c854f0c # 0.5.4
# NOTE: need to run this on a node with GPUs
python setup.py install

Install a custom Spatially Sparse Convolution Library build:

# install customized spconv
cd third_party/spconv
python setup.py bdist_wheel
cd ./dist
pip install spconv-1.2.1-cp38-cp38-linux_x86_64.whl

Data Setup

Preprocessing data

Please refer to data_preprocessing/lyft/LYFT_PREPROCESSING.md and data_preprocessing/ithaca365/ITHACA365_PREPROCESSING.md.

Precomputing Seed Labels

This project builds upon the MODEST codebase. Follow the precomputation of P2 Scores as well as the baseline model trained on seed labels according to their set-up. For simplicity, we also provide the baseline model checkpoint of the detectors trained on seed labels in Model Checkpoints section.

To generate P2 score to adapt to the Lyft dataset:

cd $PROJECT_ROOT_DIR
# generate pp score
python generate_cluster_mask/pre_compute_p2_score.py

To generate P2 score to adapt to the Ithaca dataset:

cd $PROJECT_ROOT_DIR
# generate pp score
python generate_cluster_mask/pre_compute_p2_score.py dataset="ithaca365" data_paths="ithaca365.yaml"

DRIFT Training and Evaluation

DRIFT File Changes

All DRIFT training step changes are incorporated into the forward() call of PointRCNN, located here: downstream/OpenPCDet/pcdet/models/detectors/point_rcnn.py. All DRIFT rewards and reward helper-functions are located in downstream/OpenPCDet/pcdet/models/model_utils/rewards.py. Exploration and additional util functions can be found in downstream/OpenPCDet/pcdet/models/model_utils/unsupervised_regression_utils.py.

Launch Training

Results reported in the paper are trained using 4 GPUs. To launch training on 4 GPUs, activate the conda environment and run the following self-training scripts:

# Lyft
bash scripts/dist_train.sh 4 --cfg_file cfgs/lyft_models/pointrcnn_dynamic_drift.yaml --merge_all_iters_to_one_epoch --fix_random_seed --pretrained_model <LYFT_BASELINE_CKPT>

# Ithaca365
bash scripts/dist_train.sh 4 --cfg_file cfgs/ithaca365_models/pointrcnn_dynamic_drift.yaml --merge_all_iters_to_one_epoch --fix_random_seed --pretrained_model <ITHACA_BASELINE_CKPT>

Evaluate Checkpoints

Evaluation on multiple GPUs can be done on each checkpoint with scripts/dist_test.sh. To evaluate on 4 GPUs, activate the conda environment and run the following eval scripts:

cd downstream/OpenPCDet/tools
bash scripts/dist_test.sh 4 --cfg_file <cfg> --ckpt <ckpt_path>

Citation

If this work is helpful for your research, please consider citing us!

@inproceedings{luo2023reward,
  title={Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery},
  author={Luo, Katie Z and Liu, Zhenzhen and Chen, Xiangyu and You, Yurong and Benaim, Sagie and Phoo, Cheng Perng and Campbell, Mark and Sun, Wen and Hariharan, Bharath and Weinberger, Kilian Q},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published