This is the official code release for [NeurIPS 2023] Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery.
by Katie Z Luo*, Zhenzhen Liu*, Xiangyu Chen*, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, and Kilian Q. Weinberger
Interested in perception with about 3D object discovery? Also see MODEST.
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.
We provide necessary checkpoints for our experiment, including the baseline model trained on MODEST seed labels (before applying DRIFT) and models finetuned with DRIFT on Lyft and Ithaca365.
Below we provide results, configs, and checkpoints on the Lyft dataset.
mAP | Model | Config | |
---|---|---|---|
Baseline | 23.9 | ckpt | cfg |
DRIFT (60ep) | 26.7 | ckpt | -- |
DRIFT (120ep) | 29.6 | ckpt | cfg |
Below we provide results, configs, and checkpoints on the Ithaca365 dataset.
mAP | Model | Config | |
---|---|---|---|
Baseline | 7.7 | ckpt | cfg |
DRIFT (15ep) | 28.0 | ckpt | -- |
DRIFT (30ep) | 35.1 | ckpt | cfg |
Setup with Anaconda environment:
conda create --name drift python=3.8
conda activate drift
conda install pytorch=1.9.0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
pip install opencv-python matplotlib ray wandb scipy tqdm easydict scikit-learn pillow==8.3.2
Install some necessary dependencies, and build the project. Rote-DA is built off of the OpenPCDet framework:
# install openpcdet
cd ../../../downstream/OpenPCDet
pip install -r requirements.txt
python setup.py develop
# for managing experiments
pip install hydra-core --upgrade
pip install hydra_colorlog --upgrade
pip install rich
Install Minkowski Engine:
# ME
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
git checkout c854f0c # 0.5.4
# NOTE: need to run this on a node with GPUs
python setup.py install
Install a custom Spatially Sparse Convolution Library build:
# install customized spconv
cd third_party/spconv
python setup.py bdist_wheel
cd ./dist
pip install spconv-1.2.1-cp38-cp38-linux_x86_64.whl
Please refer to data_preprocessing/lyft/LYFT_PREPROCESSING.md
and
data_preprocessing/ithaca365/ITHACA365_PREPROCESSING.md
.
This project builds upon the MODEST codebase. Follow the precomputation of P2 Scores as well as the baseline model trained on seed labels according to their set-up. For simplicity, we also provide the baseline model checkpoint of the detectors trained on seed labels in Model Checkpoints section.
To generate P2 score to adapt to the Lyft dataset:
cd $PROJECT_ROOT_DIR
# generate pp score
python generate_cluster_mask/pre_compute_p2_score.py
To generate P2 score to adapt to the Ithaca dataset:
cd $PROJECT_ROOT_DIR
# generate pp score
python generate_cluster_mask/pre_compute_p2_score.py dataset="ithaca365" data_paths="ithaca365.yaml"
All DRIFT training step changes are incorporated into the forward()
call of PointRCNN, located here: downstream/OpenPCDet/pcdet/models/detectors/point_rcnn.py
.
All DRIFT rewards and reward helper-functions are located in downstream/OpenPCDet/pcdet/models/model_utils/rewards.py
. Exploration and additional util functions can be found in downstream/OpenPCDet/pcdet/models/model_utils/unsupervised_regression_utils.py
.
Results reported in the paper are trained using 4 GPUs. To launch training on 4 GPUs, activate the conda environment and run the following self-training scripts:
# Lyft
bash scripts/dist_train.sh 4 --cfg_file cfgs/lyft_models/pointrcnn_dynamic_drift.yaml --merge_all_iters_to_one_epoch --fix_random_seed --pretrained_model <LYFT_BASELINE_CKPT>
# Ithaca365
bash scripts/dist_train.sh 4 --cfg_file cfgs/ithaca365_models/pointrcnn_dynamic_drift.yaml --merge_all_iters_to_one_epoch --fix_random_seed --pretrained_model <ITHACA_BASELINE_CKPT>
Evaluation on multiple GPUs can be done on each checkpoint with scripts/dist_test.sh
. To evaluate on 4 GPUs, activate the conda environment and run the following eval scripts:
cd downstream/OpenPCDet/tools
bash scripts/dist_test.sh 4 --cfg_file <cfg> --ckpt <ckpt_path>
If this work is helpful for your research, please consider citing us!
@inproceedings{luo2023reward,
title={Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery},
author={Luo, Katie Z and Liu, Zhenzhen and Chen, Xiangyu and You, Yurong and Benaim, Sagie and Phoo, Cheng Perng and Campbell, Mark and Sun, Wen and Hariharan, Bharath and Weinberger, Kilian Q},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}