Skip to content

[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors

Notifications You must be signed in to change notification settings

lyk412/Consistent123

Repository files navigation

Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors

           

SIGS, Tsinghua University
*equal contribution +corresponding author

Pipeline

Example Image

News:

  • [2024/09/18] Test cases are under data folder.
  • [2024/09/16] Code is available. Try it!
  • [2024/07/16] Consistent123 gets accepted to ACMMM 2024.

Preliminary

###Clone the repo
git clone https://github.com/lyk412/Consistent123.git
cd Consistent123
###Tested environments: 
###Ubuntu 22 with python3.8 & torch 1.12.0 & CUDA 11.6 on a A100.

Create a python virtual environment

To avoid python package conflicts, we recommend using a virtual environment.

choice 1: venv

python -m venv venv_consistent123
source venv_consistent123/bin/activate 

choice 2: conda

conda create -n consistent123 python=3.8
conda activate consistent123

Install packages with pip tool

pip install -r requirements.txt

Download pre-trained models

To use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:

  • Zero-1-to-3 for diffusion backend. We use zero123-xl.ckpt by default, and it is hard-coded in guidance/zero123_utils.py.
    cd pretrained/zero123
    wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
  • Omnidata for depth and normal prediction. These ckpts are hardcoded in preprocess_image.py.
    mkdir pretrained/omnidata
    cd pretrained/omnidata
    # assume gdown is installed
    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt
    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt
  • Stable Diffusion 2.1 for Score Distilling Sample. You could download it and replace the path in line 38 in guidance/sd_utils.py.

To use DeepFloyd-IF, you need to accept the usage conditions from hugging face, and login with huggingface-cli login in command line.

For DMTet, we port the pre-generated 32/64/128 resolution tetrahedron grids under tets. The 256 resolution one can be found here.

Build extension (optional)

By default, we use load to build the extension at runtime. We also provide the setup.py to build each extension:

cd stable-dreamfusion

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)

Taichi backend (optional)

Use Taichi backend for Instant-NGP. It achieves comparable performance to CUDA implementation while No CUDA build is required. Install Taichi with pip:

pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly

Trouble Shooting:

  • we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., pip install -U diffusers). If the problem still holds, reporting a bug issue will be appreciated!
  • [F glutil.cpp:338] eglInitialize() failed Aborted (core dumped): this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in ashawkey/stable-dreamfusion#131 if you are using a headless server.
  • TypeError: xxx_forward(): incompatible function arguments: this happens when we update the CUDA source and you used setup.py to install the extensions earlier. Try to re-install the corresponding extension (e.g., pip install ./gridencoder).

Usage

Note: There are various parameter settings in main.py. Please refer to Usage in Stable-Dreamfusion for more information.

First time running will take some time to compile the CUDA extensions.

## preprocess input image
# note: the results of image-to-3D is dependent on zero-1-to-3's capability. For best performance, the input image should contain a single front-facing object, it should have square aspect ratio, with <1024 pixel resolution. Check the examples under ./data.
# this will exports `<image>_rgba.png`, `<image>_depth.png`, and `<image>_normal.png` to the directory containing the input image.
python preprocess_image.py <image>.png
python preprocess_image.py <image>.png --border_ratio 0.4 # increase border_ratio if the center object appears too large and results are unsatisfying.

## obtain text prompt
# option: 1. given by user 2. BLIP2 3. GPT-4 

## providing both --text and --image enables stable-diffusion backend

## Consistent123 Template
# first stage: 3D initialization
python main.py --weight_method <dynamic prior method(exponential/log/linear)> -O --image <path of rgba> --text <text prompt> --workspace <stage1 savedir> --iters <stage1 iterations> --dmtet_iters <stage2 iterations> --least_3Donly <CLIP detection start> --most_3Donly <CLIP detection end> --render_interval <CLIP detection interval> --threshold <CLIP detection threshold> --last_N <sliding window size>

# second stage: dynamic prior
python main.py --weight_method <dynamic prior method(exponential/log/linear)> -O --image <path of rgba> --text <text prompt> --workspace <stage2 savedir> --dmtet --iters <stage2 iterations> --nerf_iters <stage1 iterations> --init_with <stage1 savedir>/checkpoints/df.pth --convergence_path <stage1 savedir>/convergence.npy

# test / visualize
python main.py -O --workspace <stage1 savedir> --test --save_mesh --write_video
python main.py -O --workspace <stage2 savedir> --dmtet --test --save_mesh --write_video

## Consistent123 Usage
# first stage: 3D initialization
python main.py --weight_method exponential -O --image data/realfusion15/bird_rgba.png --text "a small blue-brown bird with a pointed mouth" --workspace trial_imagetext_rf15_bird_clip  --iters 5000 --dmtet_iters 5000 --least_3Donly 2000 --most_3Donly 3500 --render_interval 21 --threshold 0.00025 --last_N 5

# second stage: dynamic prior
python main.py --weight_method exponential -O --image data/realfusion15/bird_rgba.png --text "a small blue-brown bird with a pointed mouth" --workspace trial2_imagetext_rf15_bird_clip --dmtet --iters 5000 --nerf_iters 5000 --init_with trial_imagetext_rf15_bird_clip/checkpoints/df.pth --convergence_path trial_imagetext_rf15_bird_clip/convergence.npy

## test / visualize
CUDA_VISIBLE_DEVICES=1 python main.py -O --workspace trial_imagetext_rf15_bird_clip  --test --save_mesh --write_video
CUDA_VISIBLE_DEVICES=1 python main.py -O --workspace trial2_imagetext_rf15_bird_clip --dmtet --test --save_mesh --write_video

For example commands, check scripts.

Acknowledgement

This work is heavily based on Stable-Dreamfusion, we appreciate ashawkey for buiding this open-source project. We are also very grateful to the researchers working on 3D generation for sharing their code and models.

Citation

If you find this work useful, a citation will be appreciated via:

@article{lin2023consistent123,
  title={Consistent123: One image to highly consistent 3d asset using case-aware diffusion priors},
  author={Lin, Yukang and Han, Haonan and Gong, Chaoqun and Xu, Zunnan and Zhang, Yachao and Li, Xiu},
  journal={arXiv preprint arXiv:2309.17261},
  year={2023}
}

About

[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages