Skip to content

Commit 8cbf8a8

Browse files
voletivVikram Voleti
and
Vikram Voleti
authored
Progressive_view, batch_sizes can also change with resolution_milestones (#165)
* Makes progressive increase of elevation, azimuth * Zero123 phase 2 config and script * Update DOCUMENTATION.md --------- Co-authored-by: Vikram Voleti <vikram@ip-26-0-153-234.us-west-2.compute.internal>
1 parent 9d7976e commit 8cbf8a8

File tree

6 files changed

+263
-45
lines changed

6 files changed

+263
-45
lines changed

DOCUMENTATION.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@
2828
| ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
2929
| height | Union[int, List[int]] | Height of the rendered image in training, should be an integer or a list of integers. If a list of integers, the training height will change according to `resolution_milestones`. Default: 64 |
3030
| width | Union[int, List[int]] | Width of the rendered image in training, should be an integer or a list of integers. If a list of integers, the training width will change according to `resolution_milestones`. Default: 64 |
31+
| batch_size | Union[int, List[int]] | Number of images per batch in training. If a list of integers, the batch_size will change according to `resolution_milestones`. Default: 1 |
3132
| resolution_milestones | List[int] | The steps where the training resolution will change, must be in ascending order and in the length of `len(height) - 1`. Default: [] |
3233
| eval_height | int | Height of the rendered image in validation/testing. Default: 512 |
3334
| eval_width | int | Width of the rendered image in validation/testing. Default: 512 |
34-
| batch_size | int | Number of images per batch in training. Default: 1 |
3535
| eval_batch_size | int | Number of images per batch in validation/testing. DO NOT change this. Default: 1 |
3636
| elevation_range | Tuple[float,float] | Camera elevation angle range to sample from in training, in degrees. Default: (-10,90) |
3737
| azimuth_range | Tuple[float,float] | Camera azimuth angle range to sample from in training, in degrees. Default: (-180,180) |
@@ -47,6 +47,7 @@
4747
| eval_fovy_deg | float | Camera field of view (FoV) along the y direction (vertical direction) in validation/testing, in degrees. Default: 70 |
4848
| light_sample_strategy | str | Strategy to sample point light positions in training, in ["dreamfusion", "magic3d"]. "dreamfusion" uses strategy described in the DreamFusion paper; "magic3d" uses strategy decribed in the Magic3D paper. Default: "dreamfusion" |
4949
| batch_uniform_azimuth | bool | Whether to ensure the uniformity of sampled azimuth angles in training as described in the Fantasia3D paper. If True, the `azimuth_range` is equally divided into `batch_size` bins and the azimuth angles are sampled from every bins. Default: True |
50+
| progressive_until | int | Number of iterations until which to progressively (linearly) increase elevation_range and azimuth_range from [`eval_elevation_deg`, `eval_elevation_deg`] and `[0.0, 0.0]`, to those values specified in `elevation_range` and `azimuth_range`. 0 means the range does not linearly increase. Default: 0 |
5051

5152
## Systems
5253

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
name: "imagecondition"
2+
tag: "${rmspace:${system.prompt_processor.prompt},_}"
3+
exp_root_dir: "outputs"
4+
seed: 0
5+
6+
data_type: "single-image-datamodule"
7+
data:
8+
image_path: ./load/images/hamburger_rgba.png
9+
height: 256
10+
width: 256
11+
default_elevation_deg: 0.0
12+
default_azimuth_deg: 0.0
13+
default_camera_distance: 3.8
14+
default_fovy_deg: 20.0
15+
random_camera:
16+
batch_size: 4
17+
height: 256
18+
width: 256
19+
eval_height: 512
20+
eval_width: 512
21+
eval_batch_size: 1
22+
elevation_range: [-10, 80]
23+
azimuth_range: [-180, 180]
24+
camera_distance_range: [3.8, 3.8]
25+
fovy_range: [20.0, 20.0] # Zero123 has fixed fovy
26+
progressive_until: 0
27+
camera_perturb: 0.0
28+
center_perturb: 0.0
29+
up_perturb: 0.0
30+
light_position_perturb: 1.0
31+
light_distance_range: [7.5, 10.0]
32+
eval_elevation_deg: ${data.default_elevation_deg}
33+
eval_camera_distance: ${data.default_camera_distance}
34+
eval_fovy_deg: ${data.default_fovy_deg}
35+
light_sample_strategy: "dreamfusion"
36+
batch_uniform_azimuth: False
37+
n_val_views: 30
38+
n_test_views: 120
39+
40+
system_type: "image-condition-dreamfusion-system"
41+
system:
42+
geometry_type: "implicit-volume"
43+
geometry:
44+
radius: 2.0
45+
normal_type: "analytic"
46+
47+
# the density initialization proposed in the DreamFusion paper
48+
# does not work very well
49+
# density_bias: "blob_dreamfusion"
50+
# density_activation: exp
51+
# density_blob_scale: 5.
52+
# density_blob_std: 0.2
53+
54+
# use Magic3D density initialization instead
55+
density_bias: "blob_magic3d"
56+
density_activation: softplus
57+
density_blob_scale: 10.
58+
density_blob_std: 0.5
59+
60+
# coarse to fine hash grid encoding
61+
# to ensure smooth analytic normals
62+
pos_encoding_config:
63+
otype: HashGrid
64+
n_levels: 16
65+
n_features_per_level: 2
66+
log2_hashmap_size: 19
67+
base_resolution: 16
68+
per_level_scale: 1.447269237440378 # max resolution 4096
69+
mlp_network_config:
70+
otype: "VanillaMLP"
71+
activation: "ReLU"
72+
output_activation: "none"
73+
n_neurons: 64
74+
n_hidden_layers: 2
75+
76+
material_type: "diffuse-with-point-light-material"
77+
material:
78+
ambient_only_steps: 100000
79+
textureless_prob: 0.05
80+
albedo_activation: sigmoid
81+
82+
background_type: "neural-environment-map-background"
83+
background:
84+
color_activation: sigmoid
85+
86+
renderer_type: "nerf-volume-renderer"
87+
renderer:
88+
radius: ${system.geometry.radius}
89+
num_samples_per_ray: 512
90+
return_comp_normal: ${gt0:${system.loss.lambda_normal_smooth}}
91+
return_normal_perturb: ${gt0:${system.loss.lambda_3d_normal_smooth}}
92+
93+
prompt_processor_type: "stable-diffusion-prompt-processor"
94+
prompt_processor:
95+
pretrained_model_name_or_path: "runwayml/stable-diffusion-v1-5"
96+
prompt: "a DSLR photo of a delicious hamburger"
97+
98+
guidance_type: "stable-diffusion-guidance"
99+
guidance:
100+
pretrained_model_name_or_path: "runwayml/stable-diffusion-v1-5"
101+
guidance_scale: 7.5
102+
min_step_percent: 0.2
103+
# min_step_percent: [0, 0.66, 0.33, 2000] # (start_iter, start_val, end_val, end_iter)
104+
max_step_percent: 0.6
105+
# max_step_percent: [0, 0.98, 0.66, 2000]
106+
107+
# prompt_processor_type: "deep-floyd-prompt-processor"
108+
# prompt_processor:
109+
# pretrained_model_name_or_path: "DeepFloyd/IF-I-XL-v1.0"
110+
# prompt: "a DSLR photo of a delicious hamburger"
111+
112+
# guidance_type: "deep-floyd-guidance"
113+
# guidance:
114+
# pretrained_model_name_or_path: "DeepFloyd/IF-I-XL-v1.0"
115+
# guidance_scale: 7.5
116+
# min_step_percent: 0.2
117+
# # min_step_percent: [0, 0.66, 0.33, 2000] # (start_iter, start_val, end_val, end_iter)
118+
# max_step_percent: 0.6
119+
# # max_step_percent: [0, 0.98, 0.66, 2000]
120+
121+
freq:
122+
ref_only_steps: 0
123+
guidance_eval: 13
124+
125+
loggers:
126+
wandb:
127+
enable: false
128+
project: 'threestudio'
129+
name: None
130+
131+
loss:
132+
lambda_sds: 0.1
133+
lambda_rgb: 400.0
134+
lambda_mask: 50.0
135+
lambda_depth: 0.05
136+
lambda_normal_smooth: 2.0
137+
lambda_3d_normal_smooth: 5.0
138+
lambda_orient: 0.01
139+
lambda_sparsity: 0.01
140+
lambda_opaque: 0.05
141+
142+
optimizer:
143+
name: Adan
144+
args:
145+
lr: 0.005
146+
max_grad_norm: 5.0
147+
eps: 1.e-8
148+
weight_decay: 1e-5
149+
params:
150+
geometry:
151+
lr: ${system.optimizer.args.lr}
152+
background:
153+
lr: 0.0
154+
155+
trainer:
156+
max_steps: 2000
157+
log_every_n_steps: 1
158+
num_sanity_val_steps: 0
159+
val_check_interval: 20
160+
enable_progress_bar: true
161+
precision: 16-mixed
162+
163+
checkpoint:
164+
save_last: true # save at each validation time
165+
save_top_k: -1
166+
every_n_train_steps: 20 # ${trainer.max_steps}

configs/zero123.yaml

+14-12
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: "zero123"
2-
tag: "${data.random_camera.height}_${rmspace:${basename:${data.image_path}},_}"
2+
tag: "${data.random_camera.height}_${rmspace:${basename:${data.image_path}},_}_prog${data.random_camera.progressive_until}"
33
exp_root_dir: "outputs"
44
seed: 0
55

@@ -13,16 +13,18 @@ data: # threestudio/data/image.py -> SingleImageDataModuleConfig
1313
default_camera_distance: 3.8
1414
default_fovy_deg: 20.0
1515
random_camera: # threestudio/data/uncond.py -> RandomCameraDataModuleConfig
16-
height: 64
17-
width: 64
16+
height: [64, 128]
17+
width: [64, 128]
18+
batch_size: [12, 6]
19+
resolution_milestones: [200]
1820
eval_height: 256
1921
eval_width: 256
20-
batch_size: 12
2122
eval_batch_size: 1
2223
elevation_range: [-10, 80]
2324
azimuth_range: [-180, 180]
2425
camera_distance_range: [3.8, 3.8]
25-
fovy_range: [20.0, 20.0]
26+
fovy_range: [20.0, 20.0] # Zero123 has fixed fovy
27+
progressive_until: 0
2628
camera_perturb: 0.0
2729
center_perturb: 0.0
2830
up_perturb: 0.0
@@ -70,7 +72,7 @@ system:
7072
activation: "ReLU"
7173
output_activation: "none"
7274
n_neurons: 64
73-
n_hidden_layers: 1
75+
n_hidden_layers: 2
7476

7577
material_type: "diffuse-with-point-light-material"
7678
material:
@@ -122,14 +124,14 @@ system:
122124
name: None
123125

124126
loss:
125-
lambda_sds: 0.03
127+
lambda_sds: 0.05
126128
lambda_rgb: 500.
127129
lambda_mask: 50.
128130
lambda_depth: 0.05
129131
lambda_normal_smooth: 5.0
130-
lambda_3d_normal_smooth: 2.0
132+
lambda_3d_normal_smooth: 5.0
131133
lambda_orient: 1.0
132-
lambda_sparsity: 0.1 # should be tweaked for every model
134+
lambda_sparsity: 0.2 # should be tweaked for every model
133135
lambda_opaque: 0.05
134136

135137
optimizer:
@@ -143,13 +145,13 @@ system:
143145
geometry:
144146
lr: ${system.optimizer.args.lr}
145147
background:
146-
lr: ${system.optimizer.args.lr}
148+
lr: 0.0
147149

148150
trainer:
149-
max_steps: 1999
151+
max_steps: 300
150152
log_every_n_steps: 1
151153
num_sanity_val_steps: 0
152-
val_check_interval: 100
154+
val_check_interval: 50
153155
enable_progress_bar: true
154156
precision: 16-mixed
155157

threestudio/data/image.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
RandomCameraDataset,
1616
RandomCameraIterableDataset,
1717
)
18+
from threestudio.utils.base import Updateable
1819
from threestudio.utils.config import parse_structured
1920
from threestudio.utils.misc import get_rank
2021
from threestudio.utils.ops import (
@@ -154,7 +155,7 @@ def get_all_images(self):
154155
return self.rgb
155156

156157

157-
class SingleImageIterableDataset(IterableDataset, SingleImageDataBase):
158+
class SingleImageIterableDataset(IterableDataset, SingleImageDataBase, Updateable):
158159
def __init__(self, cfg: Any, split: str) -> None:
159160
super().__init__()
160161
self.setup(cfg, split)
@@ -178,6 +179,9 @@ def collate(self, batch) -> Dict[str, Any]:
178179

179180
return batch
180181

182+
def update_step(self, epoch: int, global_step: int, on_load_weights: bool = False):
183+
self.random_pose_generator.update_step(epoch, global_step, on_load_weights)
184+
181185
def __iter__(self):
182186
while True:
183187
yield {}

0 commit comments

Comments
 (0)