Skip to content

Commit 0fcf9ad

Browse files
committed
update image dream
1 parent c1297e5 commit 0fcf9ad

33 files changed

+2434
-1414
lines changed

README.md

+20-50
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,12 @@
1-
# MVDream - threestudio
2-
Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
1+
# ImageDream - threestudio
2+
Peng Wang, Yichun Shi
33

4-
| [Project Page](https://mv-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [Gallery](
5-
https://mv-dream.github.io/gallery_0.html) | [Comparison](https://mv-dream.github.io/test_0.html) |
4+
| [Project Page](https://image-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [Gallery](https://mv-dream.github.io/gallery_0.html)
65

76

7+
- **For diffusion model and 2D image generation** check ```./extern/ImageDream```
88

9-
- **This code is forked from [threestudio](https://github.com/threestudio-project/threestudio) for SDS and 3D Generation using MVDream.**
10-
- **For diffusion model and 2D image generation, check original [MVDream](https://github.com/bytedance/MVDream) repo.**
11-
12-
![mvdream-threestudio-teaser](https://github.com/bytedance/MVDream-threestudio/assets/21265012/b2fef804-7f3f-4b3a-a1a9-8b51596deb54)
9+
![imagedream-threestudio-teaser](https://github.com/bytedance/imagedream-threestudio/assets/21265012/b2fef804-7f3f-4b3a-a1a9-8b51596deb54)
1310

1411
## Installation
1512

@@ -53,66 +50,39 @@ pip install ninja
5350
pip install -r requirements.txt
5451
```
5552

56-
### Install MVDream
57-
MVDream multi-view diffusion model is provided in a different codebase. Install it by:
53+
### Install imagedream
54+
imagedream multi-view diffusion model is provided in a different codebase. Install it by:
5855

5956
```sh
60-
git clone https://github.com/bytedance/MVDream extern/MVDream
61-
pip install -e extern/MVDream
57+
git clone https://github.com/bytedance/imagedream extern/imagedream
58+
pip install -e extern/imagedream
6259
```
6360

6461

6562
## Quickstart
6663

67-
We currently provide two configurations for MVDream, one without soft-shading and one with it. The one without shading is more effecient in both memory and time. You can run it by
68-
69-
```sh
70-
# MVDream without shading (memory efficient)
71-
python launch.py --config configs/mvdream-sd21.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
72-
```
73-
7464
In the paper, we use the configuration with soft-shading. It would need an A100 GPU in most cases to compute normal:
7565
```sh
76-
# MVDream with shading (used in paper)
77-
python launch.py --config configs/mvdream-sd21-shading.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
78-
```
79-
80-
### Resume from checkpoints
81-
82-
If you want to resume from a checkpoint, do:
83-
84-
```sh
85-
# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
86-
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
87-
# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
88-
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
89-
# you can also perform testing using resumed checkpoints
90-
python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
91-
# note that the above commands use parsed configuration files from previous trials
92-
# which will continue using the same trial directory
93-
# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command
94-
95-
# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):
96-
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt
66+
# imagedream with shading (used in paper)
67+
python launch.py --config configs/imagedream-sd21-shading.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
9768
```
9869

9970
## Tips
100-
- **Preview**. Generating 3D content with SDS would a take a lot of time. So we suggest to use the 2D multi-view image generation [MVDream](https://github.com/bytedance/MVDream) to test if the model can really understand the text before using it for 3D generation.
101-
- **Rescale Factor**. We introducte rescale adjustment from [Shanchuan et al.](https://arxiv.org/abs/2305.08891) to alleviate the texture over-saturation from large CFG guidance. However, in some cases, we find it to cause floating noises in the generated scene and consequently OOM issue. Therefore we reduce the rescale factor from 0.7 in original paper to 0.5. However, if you still encounter such a problem, please try to further reduce `system.guidance.recon_std_rescale=0.3`.
71+
- Try to use object image with 0 elevation to obtain best result. Place the object at the center of image.
72+
- **Preview**. May adopt [ImageDream](https://github.com/bytedance/imagedream) to test if the model can really understand the image and text before using it for 3D generation.
73+
- **Other** Try to refer to [imagedream-threestudio]() for more tips in optimization configuration.
10274

103-
## Credits
10475

76+
## Credits
10577
This code is built on the [threestudio-project](https://github.com/threestudio-project/threestudio). Thanks to the maintainers for their contribution to the community!
10678

107-
## Citing
108-
109-
If you find MVDream helpful, please consider citing:
11079

80+
## Cite
11181
```
112-
@article{shi2023MVDream,
113-
author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
114-
title = {MVDream: Multi-view Diffusion for 3D Generation},
115-
journal = {arXiv:2308.16512},
82+
@article{pengwang2023ImageDream,
83+
author = {Wang, Peng and Shi, Yichun},
84+
title = {ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation},
85+
journal = {arXiv: to-be-update},
11686
year = {2023},
11787
}
11888
```

configs/mvdream-sd21-shading.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: "mvdream-sd21-rescale0.5-shading"
1+
name: "imagedream-sd21-rescale0.5-shading"
22
tag: "${rmspace:${system.prompt_processor.prompt},_}"
33
exp_root_dir: "outputs"
44
seed: 0
@@ -21,7 +21,7 @@ data:
2121
eval_camera_distance: 3.0
2222
eval_fovy_deg: 40.
2323

24-
system_type: "mvdream-system"
24+
system_type: "imagedream-system"
2525
system:
2626
geometry_type: "implicit-volume"
2727
geometry:

configs/mvdream-sd21.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: "mvdream-sd21-rescale0.5"
1+
name: "imagedream-sd21-rescale0.5"
22
tag: "${rmspace:${system.prompt_processor.prompt},_}"
33
exp_root_dir: "outputs"
44
seed: 0
@@ -21,7 +21,7 @@ data:
2121
eval_camera_distance: 3.0
2222
eval_fovy_deg: 40.
2323

24-
system_type: "mvdream-system"
24+
system_type: "imagedream-system"
2525
system:
2626
geometry_type: "implicit-volume"
2727
geometry:

extern/ImageDream/README.md

+18-65
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,50 @@
1-
# MVDream
2-
Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
1+
# ImageDream Diffusion
2+
Peng Wang, Yichun Shi
33

4-
| [Project Page](https://mv-dream.github.io/) | [3D Generation](https://github.com/bytedance/MVDream-threestudio) | [Paper](https://arxiv.org/abs/2308.16512) | [HuggingFace Demo (Coming)]() |
4+
| [Project Page](https://image-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [HuggingFace Demo]() |
55

6-
![multiview diffusion](https://github.com/bytedance/MVDream/assets/21265012/849cb798-1d97-42fd-9f02-c23b0dc507d5)
7-
8-
## 3D Generation
9-
10-
- **This repository only includes the diffusion model and 2D image generation code of [MVDream](https://mv-dream.github.io/index.html) paper.**
11-
- **For 3D Generation, please check [MVDream-threestudio](https://github.com/bytedance/MVDream-threestudio).**
6+
##
7+
- **This repo inherit content from repos of [LDM](), [MVDream]() and some adaptor module from [IP-Adaptor]()**
8+
- **It only includes the diffusion model and 2D image generation.**
9+
- **For 3D Generation, please check [ImageDream](https://github.com/bytedance/ImageDream).**
1210

1311

1412
## Installation
15-
You can use the same environment as in [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion) for this repo. Or you can set up the environment by installing the given requirements
13+
Setup environment as in [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion) for this repo. You can set up the environment by installing the given requirements
1614
``` bash
1715
pip install -r requirements.txt
1816
```
1917

20-
To use MVDream as a python module, you can install it by `pip install -e .` or:
21-
``` python
22-
pip install git+https://github.com/bytedance/MVDream
23-
```
24-
2518
## Model Card
26-
Our models are provided on the [Huggingface Model Page](https://huggingface.co/MVDream/MVDream/) with the OpenRAIL license.
27-
| Model | Base Model | Resolution |
28-
| ----------- | ----------- | ----------- |
29-
| sd-v2.1-base-4view | [Stable Diffusion 2.1 Base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) | 4x256x256 |
30-
| sd-v1.5-4view | [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | 4x256x256 |
31-
32-
By default, we use the SD-2.1-base model in our experiments.
19+
Our models are provided on the [Huggingface Model Page](https://huggingface.co/Peng-Wang/ImageDream/) with the OpenRAIL license.
3320

21+
We use the SD-2.1-base model in our experiments.
3422
Note that you don't have to manually download the checkpoints for the following scripts.
3523

3624

37-
## Text-to-Image
25+
## Image-to-Multi-View
26+
Notice we will re-place the object in the center of RGBA image. A short description of the image is necessary to obtain good results since we train a model with join modality. For image only case, one may run a simple caption model such as [Llava]() or [BLIP2](), which may get similar results.
3827

3928
You can simply generate multi-view images by running the following command:
4029

4130
``` bash
42-
python scripts/t2i.py --text "an astronaut riding a horse"
31+
python scripts/imagedream.py --image "./assets/astronaut.png" --text "an astronaut riding a horse"
4332
```
44-
We also provide a gradio script to try out with GUI:
4533

34+
We also provide a gradio script to try out with GUI:
4635
``` bash
4736
python scripts/gradio_app.py
4837
```
4938

50-
## Usage
51-
#### Load the Model
52-
We provide two ways to load the models of MVDream:
53-
- **Automatic**: load the model config with model name and weights from huggingface.
54-
``` python
55-
from mvdream.model_zoo import build_model
56-
model = build_model("sd-v2.1-base-4view")
57-
```
58-
- **Manual**: load the model with a config file and a checkpoint file.
59-
``` python
60-
from omegaconf import OmegaConf
61-
from mvdream.ldm.util import instantiate_from_config
62-
config = OmegaConf.load("mvdream/configs/sd-v2-base.yaml")
63-
model = instantiate_from_config(config.model)
64-
model.load_state_dict(torch.load("path/to/sd-v2.1-base-4view.th", map_location='cpu'))
65-
```
66-
67-
#### Inference
68-
Here is a simple example for model inference:
69-
``` python
70-
import torch
71-
from mvdream.camera_utils import get_camera
72-
model.eval()
73-
model.cuda()
74-
with torch.no_grad():
75-
noise = torch.randn(4,4,32,32, device="cuda") # batch of 4x for 4 views, latent size 32=256/8
76-
t = torch.tensor([999]*4, dtype=torch.long, device="cuda") # same timestep for 4 views
77-
cond = {
78-
"context": model.get_learned_conditioning([""]*4).cuda(), # text embeddings
79-
"camera": get_camera(4).cuda(),
80-
"num_frames": 4,
81-
}
82-
eps = model.apply_model(noise, t, cond=cond)
83-
```
84-
85-
8639
## Acknowledgement
8740
This repository is heavily based on [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-2-1-base). We would like to thank the authors of these work for publicly releasing their code.
8841

8942
## Citation
9043
``` bibtex
91-
@article{shi2023MVDream,
92-
author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
93-
title = {MVDream: Multi-view Diffusion for 3D Generation},
94-
journal = {arXiv:2308.16512},
44+
@article{pengwang2023ImageDream,
45+
author = {Wang, Peng and Shi, Yichun},
46+
title = {ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation},
47+
journal = {arXiv-to-be-update},
9548
year = {2023},
9649
}
9750
```
808 KB
Loading
+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
from .model_zoo import build_model
1+
from .model_zoo import build_model

extern/ImageDream/imagedream/camera_utils.py

+33-16
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ def create_camera_to_world_matrix(elevation, azimuth):
99
x = np.cos(elevation) * np.sin(azimuth)
1010
y = np.sin(elevation)
1111
z = np.cos(elevation) * np.cos(azimuth)
12-
12+
1313
# Calculate camera position, target, and up vectors
1414
camera_pos = np.array([x, y, z])
1515
target = np.array([0, 0, 0])
1616
up = np.array([0, 1, 0])
17-
17+
1818
# Construct view matrix
1919
forward = target - camera_pos
2020
forward /= np.linalg.norm(forward)
@@ -35,34 +35,51 @@ def convert_opengl_to_blender(camera_matrix):
3535
camera_matrix_blender = np.dot(flip_yz, camera_matrix)
3636
else:
3737
# Construct transformation matrix to convert from OpenGL space to Blender space
38-
flip_yz = torch.tensor([[1, 0, 0, 0], [0, 0, -1, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
38+
flip_yz = torch.tensor(
39+
[[1, 0, 0, 0], [0, 0, -1, 0], [0, 1, 0, 0], [0, 0, 0, 1]]
40+
)
3941
if camera_matrix.ndim == 3:
4042
flip_yz = flip_yz.unsqueeze(0)
4143
camera_matrix_blender = torch.matmul(flip_yz.to(camera_matrix), camera_matrix)
4244
return camera_matrix_blender
4345

4446

4547
def normalize_camera(camera_matrix):
46-
''' normalize the camera location onto a unit-sphere'''
48+
"""normalize the camera location onto a unit-sphere"""
4749
if isinstance(camera_matrix, np.ndarray):
48-
camera_matrix = camera_matrix.reshape(-1,4,4)
49-
translation = camera_matrix[:,:3,3]
50-
translation = translation / (np.linalg.norm(translation, axis=1, keepdims=True) + 1e-8)
51-
camera_matrix[:,:3,3] = translation
50+
camera_matrix = camera_matrix.reshape(-1, 4, 4)
51+
translation = camera_matrix[:, :3, 3]
52+
translation = translation / (
53+
np.linalg.norm(translation, axis=1, keepdims=True) + 1e-8
54+
)
55+
camera_matrix[:, :3, 3] = translation
5256
else:
53-
camera_matrix = camera_matrix.reshape(-1,4,4)
54-
translation = camera_matrix[:,:3,3]
55-
translation = translation / (torch.norm(translation, dim=1, keepdim=True) + 1e-8)
56-
camera_matrix[:,:3,3] = translation
57-
return camera_matrix.reshape(-1,16)
57+
camera_matrix = camera_matrix.reshape(-1, 4, 4)
58+
translation = camera_matrix[:, :3, 3]
59+
translation = translation / (
60+
torch.norm(translation, dim=1, keepdim=True) + 1e-8
61+
)
62+
camera_matrix[:, :3, 3] = translation
63+
return camera_matrix.reshape(-1, 16)
5864

5965

60-
def get_camera(num_frames, elevation=15, azimuth_start=0, azimuth_span=360, blender_coord=True):
66+
def get_camera(
67+
num_frames,
68+
elevation=15,
69+
azimuth_start=0,
70+
azimuth_span=360,
71+
blender_coord=True,
72+
extra_view=False,
73+
):
6174
angle_gap = azimuth_span / num_frames
6275
cameras = []
63-
for azimuth in np.arange(azimuth_start, azimuth_span+azimuth_start, angle_gap):
76+
for azimuth in np.arange(azimuth_start, azimuth_span + azimuth_start, angle_gap):
6477
camera_matrix = create_camera_to_world_matrix(elevation, azimuth)
6578
if blender_coord:
6679
camera_matrix = convert_opengl_to_blender(camera_matrix)
6780
cameras.append(camera_matrix.flatten())
68-
return torch.tensor(np.stack(cameras, 0)).float()
81+
82+
if extra_view:
83+
dim = len(cameras[0])
84+
cameras.append(np.zeros(dim))
85+
return torch.tensor(np.stack(cameras, 0)).float()

extern/ImageDream/imagedream/configs/sd-v1.yaml

-52
This file was deleted.

0 commit comments

Comments
 (0)