bytedance
diff --git a/‎README.md
+20-50 b/‎README.md
+20-50
diff --git a/‎configs/mvdream-sd21-shading.yaml
+2-2 b/‎configs/mvdream-sd21-shading.yaml
+2-2
diff --git a/‎configs/mvdream-sd21.yaml
+2-2 b/‎configs/mvdream-sd21.yaml
+2-2
diff --git a/‎extern/ImageDream/README.md
+18-65 b/‎extern/ImageDream/README.md
+18-65
diff --git a/‎extern/ImageDream/assets/astronaut.png
808 KB b/‎extern/ImageDream/assets/astronaut.png
808 KB
diff --git a/‎extern/ImageDream/imagedream/__init__.py
+1-1 b/‎extern/ImageDream/imagedream/__init__.py
+1-1
diff --git a/‎extern/ImageDream/imagedream/camera_utils.py
+33-16 b/‎extern/ImageDream/imagedream/camera_utils.py
+33-16
diff --git a/‎extern/ImageDream/imagedream/configs/sd-v1.yaml
-52 b/‎extern/ImageDream/imagedream/configs/sd-v1.yaml
-52
@@ -1,15 +1,12 @@
-# MVDream - threestudio
-Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
+# ImageDream - threestudio
+Peng Wang, Yichun Shi
 
-| [Project Page](https://mv-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [Gallery](
-https://mv-dream.github.io/gallery_0.html) | [Comparison](https://mv-dream.github.io/test_0.html) |
+| [Project Page](https://image-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [Gallery](https://mv-dream.github.io/gallery_0.html) 
 
 
+- **For diffusion model and 2D image generation** check ```./extern/ImageDream```
 
-- **This code is forked from [threestudio](https://github.com/threestudio-project/threestudio) for SDS and 3D Generation using MVDream.**
-- **For diffusion model and 2D image generation, check original [MVDream](https://github.com/bytedance/MVDream) repo.**
-
-![mvdream-threestudio-teaser](https://github.com/bytedance/MVDream-threestudio/assets/21265012/b2fef804-7f3f-4b3a-a1a9-8b51596deb54)
+![imagedream-threestudio-teaser](https://github.com/bytedance/imagedream-threestudio/assets/21265012/b2fef804-7f3f-4b3a-a1a9-8b51596deb54)
 
 ## Installation
 
@@ -53,66 +50,39 @@ pip install ninja
 pip install -r requirements.txt
 ```
 
-### Install MVDream
-MVDream multi-view diffusion model is provided in a different codebase. Install it by:
+### Install imagedream
+imagedream multi-view diffusion model is provided in a different codebase. Install it by:
 
 ```sh
-git clone https://github.com/bytedance/MVDream extern/MVDream
-pip install -e extern/MVDream 
+git clone https://github.com/bytedance/imagedream extern/imagedream
+pip install -e extern/imagedream 
 ```
 
 
 ## Quickstart
 
-We currently provide two configurations for MVDream, one without soft-shading and one with it. The one without shading is more effecient in both memory and time. You can run it by
-
-```sh
-# MVDream without shading (memory efficient)
-python launch.py --config configs/mvdream-sd21.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
-```
-
 In the paper, we use the configuration with soft-shading. It would need an A100 GPU in most cases to compute normal:
 ```sh
-# MVDream with shading (used in paper)
-python launch.py --config configs/mvdream-sd21-shading.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
-```
-
-### Resume from checkpoints
-
-If you want to resume from a checkpoint, do:
-
-```sh
-# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
-python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
-# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
-python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
-# you can also perform testing using resumed checkpoints
-python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
-# note that the above commands use parsed configuration files from previous trials
-# which will continue using the same trial directory
-# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command
-
-# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):
-python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt
+# imagedream with shading (used in paper)
+python launch.py --config configs/imagedream-sd21-shading.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse"
 ```
 
 ## Tips
-- **Preview**. Generating 3D content with SDS would a take a lot of time. So we suggest to use the 2D multi-view image generation [MVDream](https://github.com/bytedance/MVDream) to test if the model can really understand the text before using it for 3D generation.
-- **Rescale Factor**. We introducte rescale adjustment from [Shanchuan et al.](https://arxiv.org/abs/2305.08891) to alleviate the texture over-saturation from large CFG guidance. However, in some cases, we find it to cause floating noises in the generated scene and consequently OOM issue. Therefore we reduce the rescale factor from 0.7 in original paper to 0.5. However, if you still encounter such a problem, please try to further reduce `system.guidance.recon_std_rescale=0.3`.
+- Try to use object image with 0 elevation to obtain best result. Place the object at the center of image. 
+- **Preview**.  May adopt [ImageDream](https://github.com/bytedance/imagedream) to test if the model can really understand the image and text before using it for 3D generation.
+- **Other** Try to refer to [imagedream-threestudio]() for more tips in optimization configuration.
 
-## Credits
 
+## Credits
 This code is built on the [threestudio-project](https://github.com/threestudio-project/threestudio). Thanks to the maintainers for their contribution to the community!
 
-## Citing
-
-If you find MVDream helpful, please consider citing:
 
+## Cite
 ```
-@article{shi2023MVDream,
-  author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
-  title = {MVDream: Multi-view Diffusion for 3D Generation},
-  journal = {arXiv:2308.16512},
+@article{pengwang2023ImageDream,
+  author = {Wang, Peng and Shi, Yichun},
+  title = {ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation},
+  journal = {arXiv: to-be-update},
   year = {2023},
 }
 ```
@@ -1,4 +1,4 @@
-name: "mvdream-sd21-rescale0.5-shading"
+name: "imagedream-sd21-rescale0.5-shading"
 tag: "${rmspace:${system.prompt_processor.prompt},_}"
 exp_root_dir: "outputs"
 seed: 0
@@ -21,7 +21,7 @@ data:
   eval_camera_distance: 3.0
   eval_fovy_deg: 40.
 
-system_type: "mvdream-system"
+system_type: "imagedream-system"
 system:
   geometry_type: "implicit-volume"
   geometry:
 
@@ -1,4 +1,4 @@
-name: "mvdream-sd21-rescale0.5"
+name: "imagedream-sd21-rescale0.5"
 tag: "${rmspace:${system.prompt_processor.prompt},_}"
 exp_root_dir: "outputs"
 seed: 0
@@ -21,7 +21,7 @@ data:
   eval_camera_distance: 3.0
   eval_fovy_deg: 40.
 
-system_type: "mvdream-system"
+system_type: "imagedream-system"
 system:
   geometry_type: "implicit-volume"
   geometry:
 
@@ -1,97 +1,50 @@
-# MVDream
-Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
+# ImageDream Diffusion
+Peng Wang, Yichun Shi
 
-| [Project Page](https://mv-dream.github.io/) | [3D Generation](https://github.com/bytedance/MVDream-threestudio) | [Paper](https://arxiv.org/abs/2308.16512) | [HuggingFace Demo (Coming)]() |
+| [Project Page](https://image-dream.github.io/) | [Paper](https://arxiv.org/abs/2308.16512) | [HuggingFace Demo]() |
 
-![multiview diffusion](https://github.com/bytedance/MVDream/assets/21265012/849cb798-1d97-42fd-9f02-c23b0dc507d5)
-
-## 3D Generation
-
-- **This repository only includes the diffusion model and 2D image generation code of [MVDream](https://mv-dream.github.io/index.html) paper.**
-- **For 3D Generation, please check [MVDream-threestudio](https://github.com/bytedance/MVDream-threestudio).**
+## 
+- **This repo inherit content from repos of [LDM](), [MVDream]() and some adaptor module from [IP-Adaptor]()**
+- **It only includes the diffusion model and 2D image generation.**
+- **For 3D Generation, please check [ImageDream](https://github.com/bytedance/ImageDream).**
 
 
 ## Installation
-You can use the same environment as in [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion) for this repo. Or you can set up the environment by installing the given requirements
+Setup environment as in [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion) for this repo. You can set up the environment by installing the given requirements
 ``` bash
 pip install -r requirements.txt
 ```
 
-To use MVDream as a python module, you can install it by `pip install -e .` or:
-``` python
-pip install git+https://github.com/bytedance/MVDream
-```
-
 ## Model Card
-Our models are provided on the [Huggingface Model Page](https://huggingface.co/MVDream/MVDream/) with the OpenRAIL license.
-| Model      | Base Model | Resolution |
-| ----------- | ----------- | ----------- |
-| sd-v2.1-base-4view   | [Stable Diffusion 2.1 Base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) | 4x256x256 |
-| sd-v1.5-4view        | [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)             | 4x256x256 |
-
-By default, we use the SD-2.1-base model in our experiments. 
+Our models are provided on the [Huggingface Model Page](https://huggingface.co/Peng-Wang/ImageDream/) with the OpenRAIL license.
 
+We use the SD-2.1-base model in our experiments. 
 Note that you don't have to manually download the checkpoints for the following scripts.
 
 
-## Text-to-Image
+## Image-to-Multi-View
+Notice we will re-place the object in the center of RGBA image. A short description of the image is necessary to obtain good results since we train a model with join modality. For image only case, one may run a simple caption model such as [Llava]() or [BLIP2](), which may get similar results. 
 
 You can simply generate multi-view images by running the following command:
 
 ``` bash
-python scripts/t2i.py --text "an astronaut riding a horse"
+python scripts/imagedream.py  --image "./assets/astronaut.png" --text "an astronaut riding a horse"
 ```
-We also provide a gradio script to try out with GUI:
 
+We also provide a gradio script to try out with GUI:
 ``` bash
 python scripts/gradio_app.py
 ```
 
-## Usage
-#### Load the Model
-We provide two ways to load the models of MVDream:
-- **Automatic**: load the model config with model name and weights from huggingface.
-``` python
-from mvdream.model_zoo import build_model
-model = build_model("sd-v2.1-base-4view")
-```
-- **Manual**: load the model with a config file and a checkpoint file.
-``` python
-from omegaconf import OmegaConf
-from mvdream.ldm.util import instantiate_from_config
-config = OmegaConf.load("mvdream/configs/sd-v2-base.yaml")
-model = instantiate_from_config(config.model)
-model.load_state_dict(torch.load("path/to/sd-v2.1-base-4view.th", map_location='cpu'))
-```
-
-#### Inference
-Here is a simple example for model inference:
-``` python
-import torch
-from mvdream.camera_utils import get_camera
-model.eval()
-model.cuda()
-with torch.no_grad():
-    noise = torch.randn(4,4,32,32, device="cuda") # batch of 4x for 4 views, latent size 32=256/8
-    t = torch.tensor([999]*4, dtype=torch.long, device="cuda") # same timestep for 4 views
-    cond = {
-        "context": model.get_learned_conditioning([""]*4).cuda(), # text embeddings
-        "camera": get_camera(4).cuda(),
-        "num_frames": 4,
-    }
-    eps = model.apply_model(noise, t, cond=cond)
-```
-
-
 ## Acknowledgement
 This repository is heavily based on [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-2-1-base). We would like to thank the authors of these work for publicly releasing their code.
 
 ## Citation
 ``` bibtex
-@article{shi2023MVDream,
-  author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
-  title = {MVDream: Multi-view Diffusion for 3D Generation},
-  journal = {arXiv:2308.16512},
+@article{pengwang2023ImageDream,
+  author = {Wang, Peng and Shi, Yichun},
+  title = {ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation},
+  journal = {arXiv-to-be-update},
   year = {2023},
 }
 ```
@@ -1 +1 @@
-from .model_zoo import build_model
+from .model_zoo import build_model
@@ -9,12 +9,12 @@ def create_camera_to_world_matrix(elevation, azimuth):
     x = np.cos(elevation) * np.sin(azimuth)
     y = np.sin(elevation)
     z = np.cos(elevation) * np.cos(azimuth)
-    
+
     # Calculate camera position, target, and up vectors
     camera_pos = np.array([x, y, z])
     target = np.array([0, 0, 0])
     up = np.array([0, 1, 0])
-    
+
     # Construct view matrix
     forward = target - camera_pos
     forward /= np.linalg.norm(forward)
@@ -35,34 +35,51 @@ def convert_opengl_to_blender(camera_matrix):
         camera_matrix_blender = np.dot(flip_yz, camera_matrix)
     else:
         # Construct transformation matrix to convert from OpenGL space to Blender space
-        flip_yz = torch.tensor([[1, 0, 0, 0], [0, 0, -1, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
+        flip_yz = torch.tensor(
+            [[1, 0, 0, 0], [0, 0, -1, 0], [0, 1, 0, 0], [0, 0, 0, 1]]
+        )
         if camera_matrix.ndim == 3:
             flip_yz = flip_yz.unsqueeze(0)
         camera_matrix_blender = torch.matmul(flip_yz.to(camera_matrix), camera_matrix)
     return camera_matrix_blender
 
 
 def normalize_camera(camera_matrix):
-    ''' normalize the camera location onto a unit-sphere'''
+    """normalize the camera location onto a unit-sphere"""
     if isinstance(camera_matrix, np.ndarray):
-        camera_matrix = camera_matrix.reshape(-1,4,4)
-        translation = camera_matrix[:,:3,3]
-        translation = translation / (np.linalg.norm(translation, axis=1, keepdims=True) + 1e-8)
-        camera_matrix[:,:3,3] = translation
+        camera_matrix = camera_matrix.reshape(-1, 4, 4)
+        translation = camera_matrix[:, :3, 3]
+        translation = translation / (
+            np.linalg.norm(translation, axis=1, keepdims=True) + 1e-8
+        )
+        camera_matrix[:, :3, 3] = translation
     else:
-        camera_matrix = camera_matrix.reshape(-1,4,4)
-        translation = camera_matrix[:,:3,3]
-        translation = translation / (torch.norm(translation, dim=1, keepdim=True) + 1e-8)
-        camera_matrix[:,:3,3] = translation
-    return camera_matrix.reshape(-1,16)
+        camera_matrix = camera_matrix.reshape(-1, 4, 4)
+        translation = camera_matrix[:, :3, 3]
+        translation = translation / (
+            torch.norm(translation, dim=1, keepdim=True) + 1e-8
+        )
+        camera_matrix[:, :3, 3] = translation
+    return camera_matrix.reshape(-1, 16)
 
 
-def get_camera(num_frames, elevation=15, azimuth_start=0, azimuth_span=360, blender_coord=True):
+def get_camera(
+    num_frames, 
+    elevation=15, 
+    azimuth_start=0, 
+    azimuth_span=360, 
+    blender_coord=True,
+    extra_view=False,
+):
     angle_gap = azimuth_span / num_frames
     cameras = []
-    for azimuth in np.arange(azimuth_start, azimuth_span+azimuth_start, angle_gap):
+    for azimuth in np.arange(azimuth_start, azimuth_span + azimuth_start, angle_gap):
         camera_matrix = create_camera_to_world_matrix(elevation, azimuth)
         if blender_coord:
             camera_matrix = convert_opengl_to_blender(camera_matrix)
         cameras.append(camera_matrix.flatten())
-    return torch.tensor(np.stack(cameras, 0)).float()
+        
+    if extra_view:
+        dim = len(cameras[0])
+        cameras.append(np.zeros(dim))  
+    return torch.tensor(np.stack(cameras, 0)).float()
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-from .model_zoo import build_model`
	`1`	`+from .model_zoo import build_model`