Please install the included environments in the root of this repo:
conda env create -f environment.yaml
conda env create -f environment-colmap.yaml
Training requires Torchvision with video_reader support, requiring the library to be built from source.
This can be done by first activating the training conda environment photoconsistent-nvs
, and cloning the Torchvision repo somewhere on your system: https://github.com/pytorch/vision/tree/release/0.11
.
Checkout the release/0.11
branch of Torchvision, and run:
python setup.py install
This should detect the ffmpeg installation in the environment and install Torchvision with video_reader enabled.
├── environment.yaml
├── dataset-data
│ ├── data
│ │ ├── test
│ │ │ ├── videos // videos for this split
│ │ │ └── poses.npy // converted camera poses
│ │ ├── train
│ │ │ ├── videos // videos for this split
│ │ │ └── poses.npy // converted camera poses
│ │ └── RealEstate10K-original // original data from RealEstate10K dataset
│ │ │ ├── test // txt files for test camera poses
│ │ │ └── train // txt files for test camera poses
│ └── extract-poses.py // Camera pose conversion script
├── instance-data // contains data from training and sampling
│ ├── checkpoints // Model checkpoints
│ ├── logs // Tensorboard logs
│ └── taming-32-4-realestate-256.ckpt // First stage VQGAN weights
└── src
├── datasets // Data input pipeline
├── launch-scripts // shell scripts for launching slurm jobs
├── models
├── scripts // python scripts for training and sampling
└── utils
RealEstate10K is a dataset consisting of real estate videos scraped from YouTube. Camera poses are recovered using SLAM.
Videos in the dataset are provided as YouTube URLs, and need to be downloaded manually using tools such as yt-dlp.
The included data pipeline directly reads frames from the videos downloaded at 360p.
The camera poses provided by the dataset are provided using the camera extrinsics. We preprocess the camera poses into world transformations of a canonical camera, specifically the same camera and coordinate system as Blender.
Navigate to the dataset-data
directory and place the downloaded Realestate files under dataset-data/data/RealEstate10K-original
.
Please also populate the dataset-data/data/test/videos
and dataset-data/data/train/videos
directories with the downloaded videos.
To convert the poses run:
python extract-poses.py test
python extract-poses.py train
Please train with the photoconsistent-nvs
environment.
Training uses PyTorch DDP. An example slurm script is provided under src/launch-scripts/train-deploy.sh
.
RealEstate10K VQGAN weights: Google Drive
RealEstate10K diffusion model weights: Google Drive
Please place the first stage VQGAN weights under instance-data/taming-32-4-realestate-256.ckpt
and the diffusion model weights under instance-data/checkpoints/2000-00000000.pth
.
Please use the photoconsistent-nvs
environment.
Sampling requires a specific directory structure per sequence to specify the desired camera pose and the given source image.
The directory will also contain the generated samples, and any intermediate files generated for evaluation.
An example is provided under instance-data/samples
:
└── instance-data
└── samples
└── 584f2fc6d686aebc // directory for one sample
├── init-ims // contains given source image
│ └── 0000.png
├── samples // contains sampled images, not created by sampling script
└── sampling-spec.json // specifies trajectory of poses
We also provide examples of our custom trajectories under custom-trajectories
, the focal length should be adjusted differently for each sequence.
Sampling is performed by navigating to the src
directory and running:
python scripts/sample-trajectory.py -o samples/584f2fc6d686aebc
Please use the photoconsistent-nvs-eval
environment.
Evaluating TSED of an evaluated trajectory is performed by computing SIFT matches, computing the SED for each matched keypoint, and computing the final TSED over neighboring pairs:
python colmap-recover-matches.py -o samples/584f2fc6d686aebc
python compute_sed.py -o samples/584f2fc6d686aebc
python eval_consistency.py -o samples/584f2fc6d686aebc