This repository contains the code for our work "Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs", BMVC 2024
by Sadra Safadoust, Fabio Tosi, Fatma Güney, and Matteo Poggi
Create a conda environment and install the requirements:
conda create -n StereoGS python=3.11
conda activate StereoGS
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
conda install plyfile tqdm
Clone the repository and its submodules, and install the submodules (Note that we use a rasterizer different from the original 3DGS):
git clone https://github.com/sadrasafa/StereoGS.git --recursive
pip install submodules/depth-diff-gaussian-rasterization
pip install submodules/simple-knn
Clone RAFT-Stereo into utils/
. install its requirements, download its checkpoints, and compile the CUDA implementation of correlation sampler:
cd utils
git clone https://github.com/princeton-vl/RAFT-Stereo.git
pip install matplotlib tensorboard scipy opencv-python opt_einsum imageio scikit-image
cd RAFT-Stereo/sampler && python setup.py install && cd ../
bash download_models.sh
cd ..
We experiment on three datasets: ETH3D, ScanNet++, and BlendedMVS. For each of the datasets, you should store them in the following structure after processing them (described below):
├── [DATASET-NAME]
├── [SCENE-NAME]
├── val_cams.json
├── images
├── depths
├── sparse
├── 0
├── cameras.txt
├── images.txt
├── points3D.txt
You can ignore val_cams.json
if you don't want to have the same train/val split as ours. You can download them for each scene from here.
Download the High-res multi-view dataset from here. We only need the undistorted jpg images and the undistorted depths. Note that only ground-truth depth that match the distorted images are provided, therefore they need to be undistorted given the camera parameters.
Download the dataset from here. We use the DSLR data for the 8b5caf3398
and b20a261fdf
scenes. Follow the instructions at Official ScanNet++ Toolkit to render depth and then undistort the images and depths. Note that the provided undistortion script only undistorts the images, however it can be easily extended to undistort depths too (e.g., check this). Also, it saves the camera intrinsics for the undistorted pinhole camera in the nerfstudio's json format, so make sure to update the camera parameters in the colmap format (cameras.txt
) accordingly as well.
Download the low-res BlendedMVS dataset from here. We use the following scenes:
5b6e716d67b396324c2d77cb
5b6eff8b67b396324c5b2672
5bf18642c50e6f7f8bdbd492
5bff3c5cfe0ea555e6bcbf3a
We remove *_masked.jpg
files.
The dataset provides the camera parameters but does not provide the COLMAP SfM points. You can follow the instructions here to obtain the sparse COLMAP model using the given poses.
You can train and evaluate the method on each of the datasets using the scripts provided. Set the appropriate dataset path and connection port and then run the scripts:
bash scripts/run_ETH3D.sh
bash scripts/run_scannetpp.sh
bash scripts/run_blendedMVS.sh
If you find our work useful in your research, please consider citing:
@inproceedings{safadoust2024BMVC,
title={Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs},
author={Safadoust, Sadra and Tosi, Fabio and G{\"u}ney, Fatma and Poggi, Matteo},
booktitle={British Machine Vision Conference (BMVC)},
year={2024}
}