Skip to content

Commit e66308c

Browse files
author
ablattmann
committed
add code
1 parent 182dd36 commit e66308c

File tree

87 files changed

+12794
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+12794
-3
lines changed

README.md

+103-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,104 @@
1-
# latent-diffusion
2-
High-Resolution Image Synthesis with Latent Diffusion Models
1+
# Latent Diffusion Models
2+
3+
## Requirements
4+
A suitable [conda](https://conda.io/) environment named `ldm` can be created
5+
and activated with:
6+
7+
```
8+
conda env create -f environment.yaml
9+
conda activate ldm
10+
```
11+
12+
# Model Zoo
13+
14+
## Pretrained Autoencoding Models
15+
![rec2](assets/reconstruction2.png)
16+
17+
18+
| Model | FID vs val | PSNR | PSIM | Link | Comments
19+
|-------------------------|------------|----------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
20+
| f=4, VQ (Z=8192, d=3) | 0.58 | 27.43 +/- 4.26 | 0.53 +/- 0.21 | https://ommer-lab.com/files/latent-diffusion/vq-f4.zip | |
21+
| f=4, VQ (Z=8192, d=3) | 1.06 | 25.21 +/- 4.17 | 0.72 +/- 0.26 | https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 | no attention |
22+
| f=8, VQ (Z=16384, d=4) | 1.14 | 23.07 +/- 3.99 | 1.17 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/vq-f8.zip | |
23+
| f=8, VQ (Z=256, d=4) | 1.49 | 22.35 +/- 3.81 | 1.26 +/- 0.37 | https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip |
24+
| f=16, VQ (Z=16384, d=8) | 5.15 | 20.83 +/- 3.61 | 1.73 +/- 0.43 | https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1 | |
25+
| | | | | | |
26+
| f=4, KL | 0.27 | 27.53 +/- 4.54 | 0.55 +/- 0.24 | https://ommer-lab.com/files/latent-diffusion/kl-f4.zip | |
27+
| f=8, KL | 0.90 | 24.19 +/- 4.19 | 1.02 +/- 0.35 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip | |
28+
| f=16, KL (d=16) | 0.87 | 24.08 +/- 4.22 | 1.07 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/kl-f16.zip | |
29+
| f=32, KL (d=64) | 2.04 | 22.27 +/- 3.93 | 1.41 +/- 0.40 | https://ommer-lab.com/files/latent-diffusion/kl-f32.zip | |
30+
31+
### Get the models
32+
33+
Running the following script downloads und extracts all available pretrained autoencoding models.
34+
35+
```shell script
36+
bash scripts/download_first_stages.sh
37+
```
38+
39+
The first stage models can then be found in `models/first_stage_models/<model_spec>`
40+
41+
## Pretrained LDMs
42+
| Datset | Task | Model | FID | IS | Prec | Recall | Link | Comments
43+
|---------------------------------|------|--------------|---------------|-----------------|------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
44+
| CelebA-HQ | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=0)| 5.11 (5.11) | 3.29 | 0.72 | 0.49 | https://ommer-lab.com/files/latent-diffusion/celeba.zip | |
45+
| FFHQ | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=1)| 4.98 (4.98) | 4.50 (4.50) | 0.73 | 0.50 | https://ommer-lab.com/files/latent-diffusion/ffhq.zip | |
46+
| LSUN-Churches | Unconditional Image Synthesis | LDM-KL-8 (400 DDIM steps, eta=0)| 4.02 (4.02) | 2.72 | 0.64 | 0.52 | https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip | |
47+
| LSUN-Bedrooms | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0) | 2.22 (2.23)| 0.66 | 0.48 | https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip | |
48+
| ImageNet | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* /15.82** | 201.56(209.52)* /78.82** | 0.84* / 0.65** | 0.35* / 0.63** | https://ommer-lab.com/files/latent-diffusion/cin.zip | *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by [ADM](https://github.com/openai/guided-diffusion) |
49+
| Conceptual Captions | Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79 | 13.89 | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/text2img.zip | finetuned from LAION |
50+
| OpenImages | Super-resolution | N/A | N/A | N/A | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip | BSR image degradation |
51+
| OpenImages | Layout-to-Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02 | 15.92 | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip | |
52+
| Landscapes (finetuned 512) | Semantic Image Synthesis | LDM-VQ-4 (100 DDIM steps, eta=1) | N/A | N/A | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip | |
53+
54+
55+
### Get the models
56+
57+
The LDMs listed above can jointly be downloaded and extracted via
58+
59+
```shell script
60+
bash scripts/download_models.sh
61+
```
62+
63+
The models can then be found in `models/ldm/<model_spec>`.
64+
65+
### Sampling with unconditional models
66+
67+
We provide a first script for sampling from our unconditional models. Start it via
68+
69+
```shell script
70+
CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta>
71+
```
72+
73+
# Inpainting
74+
![inpainting](assets/inpainting.png)
75+
76+
Download the pre-trained weights
77+
```
78+
wget XXX
79+
```
80+
81+
and sample with
82+
```
83+
python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results
84+
```
85+
`indir` should contain images `*.png` and masks `<image_fname>_mask.png` like
86+
the examples provided in `data/inpainting_examples`.
87+
88+
89+
## Comin Soon...
90+
91+
* Code for training LDMs and the corresponding compression models.
92+
* Inference scripts for conditional LDMs for various conditioning modalities.
93+
* In the meantime, you can play with our colab notebook https://colab.research.google.com/drive/1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing
94+
* We will also release some further pretrained models.
95+
## Comments
96+
97+
- Our codebase for the diffusion models builds heavily on [OpenAI's codebase](https://github.com/openai/guided-diffusion)
98+
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
99+
Thanks for open-sourcing!
100+
101+
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
102+
103+
3104

4-
...coming soon™

assets/inpainting.png

312 KB
Loading

assets/reconstruction1.png

788 KB
Loading

assets/reconstruction2.png

958 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
model:
2+
base_learning_rate: 4.5e-6
3+
target: ldm.models.autoencoder.AutoencoderKL
4+
params:
5+
monitor: "val/rec_loss"
6+
embed_dim: 16
7+
lossconfig:
8+
target: ldm.modules.losses.LPIPSWithDiscriminator
9+
params:
10+
disc_start: 50001
11+
kl_weight: 0.000001
12+
disc_weight: 0.5
13+
14+
ddconfig:
15+
double_z: True
16+
z_channels: 16
17+
resolution: 256
18+
in_channels: 3
19+
out_ch: 3
20+
ch: 128
21+
ch_mult: [ 1,1,2,2,4] # num_down = len(ch_mult)-1
22+
num_res_blocks: 2
23+
attn_resolutions: [16]
24+
dropout: 0.0
25+
26+
27+
data:
28+
target: main.DataModuleFromConfig
29+
params:
30+
batch_size: 12
31+
wrap: True
32+
train:
33+
target: ldm.data.imagenet.ImageNetSRTrain
34+
params:
35+
size: 256
36+
degradation: pil_nearest
37+
validation:
38+
target: ldm.data.imagenet.ImageNetSRValidation
39+
params:
40+
size: 256
41+
degradation: pil_nearest
42+
43+
lightning:
44+
callbacks:
45+
image_logger:
46+
target: main.ImageLogger
47+
params:
48+
batch_frequency: 1000
49+
max_images: 8
50+
increase_log_steps: True
51+
52+
trainer:
53+
benchmark: True
54+
accumulate_grad_batches: 2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
model:
2+
base_learning_rate: 4.5e-6
3+
target: ldm.models.autoencoder.AutoencoderKL
4+
params:
5+
monitor: "val/rec_loss"
6+
embed_dim: 4
7+
lossconfig:
8+
target: ldm.modules.losses.LPIPSWithDiscriminator
9+
params:
10+
disc_start: 50001
11+
kl_weight: 0.000001
12+
disc_weight: 0.5
13+
14+
ddconfig:
15+
double_z: True
16+
z_channels: 4
17+
resolution: 256
18+
in_channels: 3
19+
out_ch: 3
20+
ch: 128
21+
ch_mult: [ 1,2,4,4 ] # num_down = len(ch_mult)-1
22+
num_res_blocks: 2
23+
attn_resolutions: [ ]
24+
dropout: 0.0
25+
26+
data:
27+
target: main.DataModuleFromConfig
28+
params:
29+
batch_size: 12
30+
wrap: True
31+
train:
32+
target: ldm.data.imagenet.ImageNetSRTrain
33+
params:
34+
size: 256
35+
degradation: pil_nearest
36+
validation:
37+
target: ldm.data.imagenet.ImageNetSRValidation
38+
params:
39+
size: 256
40+
degradation: pil_nearest
41+
42+
lightning:
43+
callbacks:
44+
image_logger:
45+
target: main.ImageLogger
46+
params:
47+
batch_frequency: 1000
48+
max_images: 8
49+
increase_log_steps: True
50+
51+
trainer:
52+
benchmark: True
53+
accumulate_grad_batches: 2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
model:
2+
base_learning_rate: 4.5e-6
3+
target: ldm.models.autoencoder.AutoencoderKL
4+
params:
5+
monitor: "val/rec_loss"
6+
embed_dim: 3
7+
lossconfig:
8+
target: ldm.modules.losses.LPIPSWithDiscriminator
9+
params:
10+
disc_start: 50001
11+
kl_weight: 0.000001
12+
disc_weight: 0.5
13+
14+
ddconfig:
15+
double_z: True
16+
z_channels: 3
17+
resolution: 256
18+
in_channels: 3
19+
out_ch: 3
20+
ch: 128
21+
ch_mult: [ 1,2,4 ] # num_down = len(ch_mult)-1
22+
num_res_blocks: 2
23+
attn_resolutions: [ ]
24+
dropout: 0.0
25+
26+
27+
data:
28+
target: main.DataModuleFromConfig
29+
params:
30+
batch_size: 12
31+
wrap: True
32+
train:
33+
target: ldm.data.imagenet.ImageNetSRTrain
34+
params:
35+
size: 256
36+
degradation: pil_nearest
37+
validation:
38+
target: ldm.data.imagenet.ImageNetSRValidation
39+
params:
40+
size: 256
41+
degradation: pil_nearest
42+
43+
lightning:
44+
callbacks:
45+
image_logger:
46+
target: main.ImageLogger
47+
params:
48+
batch_frequency: 1000
49+
max_images: 8
50+
increase_log_steps: True
51+
52+
trainer:
53+
benchmark: True
54+
accumulate_grad_batches: 2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
model:
2+
base_learning_rate: 4.5e-6
3+
target: ldm.models.autoencoder.AutoencoderKL
4+
params:
5+
monitor: "val/rec_loss"
6+
embed_dim: 64
7+
lossconfig:
8+
target: ldm.modules.losses.LPIPSWithDiscriminator
9+
params:
10+
disc_start: 50001
11+
kl_weight: 0.000001
12+
disc_weight: 0.5
13+
14+
ddconfig:
15+
double_z: True
16+
z_channels: 64
17+
resolution: 256
18+
in_channels: 3
19+
out_ch: 3
20+
ch: 128
21+
ch_mult: [ 1,1,2,2,4,4] # num_down = len(ch_mult)-1
22+
num_res_blocks: 2
23+
attn_resolutions: [16,8]
24+
dropout: 0.0
25+
26+
data:
27+
target: main.DataModuleFromConfig
28+
params:
29+
batch_size: 12
30+
wrap: True
31+
train:
32+
target: ldm.data.imagenet.ImageNetSRTrain
33+
params:
34+
size: 256
35+
degradation: pil_nearest
36+
validation:
37+
target: ldm.data.imagenet.ImageNetSRValidation
38+
params:
39+
size: 256
40+
degradation: pil_nearest
41+
42+
lightning:
43+
callbacks:
44+
image_logger:
45+
target: main.ImageLogger
46+
params:
47+
batch_frequency: 1000
48+
max_images: 8
49+
increase_log_steps: True
50+
51+
trainer:
52+
benchmark: True
53+
accumulate_grad_batches: 2

0 commit comments

Comments
 (0)