|
1 |
| -# latent-diffusion |
2 |
| -High-Resolution Image Synthesis with Latent Diffusion Models |
| 1 | +# Latent Diffusion Models |
| 2 | + |
| 3 | +## Requirements |
| 4 | +A suitable [conda](https://conda.io/) environment named `ldm` can be created |
| 5 | +and activated with: |
| 6 | + |
| 7 | +``` |
| 8 | +conda env create -f environment.yaml |
| 9 | +conda activate ldm |
| 10 | +``` |
| 11 | + |
| 12 | +# Model Zoo |
| 13 | + |
| 14 | +## Pretrained Autoencoding Models |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +| Model | FID vs val | PSNR | PSIM | Link | Comments |
| 19 | +|-------------------------|------------|----------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------| |
| 20 | +| f=4, VQ (Z=8192, d=3) | 0.58 | 27.43 +/- 4.26 | 0.53 +/- 0.21 | https://ommer-lab.com/files/latent-diffusion/vq-f4.zip | | |
| 21 | +| f=4, VQ (Z=8192, d=3) | 1.06 | 25.21 +/- 4.17 | 0.72 +/- 0.26 | https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 | no attention | |
| 22 | +| f=8, VQ (Z=16384, d=4) | 1.14 | 23.07 +/- 3.99 | 1.17 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/vq-f8.zip | | |
| 23 | +| f=8, VQ (Z=256, d=4) | 1.49 | 22.35 +/- 3.81 | 1.26 +/- 0.37 | https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip | |
| 24 | +| f=16, VQ (Z=16384, d=8) | 5.15 | 20.83 +/- 3.61 | 1.73 +/- 0.43 | https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1 | | |
| 25 | +| | | | | | | |
| 26 | +| f=4, KL | 0.27 | 27.53 +/- 4.54 | 0.55 +/- 0.24 | https://ommer-lab.com/files/latent-diffusion/kl-f4.zip | | |
| 27 | +| f=8, KL | 0.90 | 24.19 +/- 4.19 | 1.02 +/- 0.35 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip | | |
| 28 | +| f=16, KL (d=16) | 0.87 | 24.08 +/- 4.22 | 1.07 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/kl-f16.zip | | |
| 29 | + | f=32, KL (d=64) | 2.04 | 22.27 +/- 3.93 | 1.41 +/- 0.40 | https://ommer-lab.com/files/latent-diffusion/kl-f32.zip | | |
| 30 | + |
| 31 | +### Get the models |
| 32 | + |
| 33 | +Running the following script downloads und extracts all available pretrained autoencoding models. |
| 34 | + |
| 35 | +```shell script |
| 36 | +bash scripts/download_first_stages.sh |
| 37 | +``` |
| 38 | + |
| 39 | +The first stage models can then be found in `models/first_stage_models/<model_spec>` |
| 40 | + |
| 41 | +## Pretrained LDMs |
| 42 | +| Datset | Task | Model | FID | IS | Prec | Recall | Link | Comments |
| 43 | +|---------------------------------|------|--------------|---------------|-----------------|------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------| |
| 44 | +| CelebA-HQ | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=0)| 5.11 (5.11) | 3.29 | 0.72 | 0.49 | https://ommer-lab.com/files/latent-diffusion/celeba.zip | | |
| 45 | +| FFHQ | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=1)| 4.98 (4.98) | 4.50 (4.50) | 0.73 | 0.50 | https://ommer-lab.com/files/latent-diffusion/ffhq.zip | | |
| 46 | +| LSUN-Churches | Unconditional Image Synthesis | LDM-KL-8 (400 DDIM steps, eta=0)| 4.02 (4.02) | 2.72 | 0.64 | 0.52 | https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip | | |
| 47 | +| LSUN-Bedrooms | Unconditional Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0) | 2.22 (2.23)| 0.66 | 0.48 | https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip | | |
| 48 | +| ImageNet | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* /15.82** | 201.56(209.52)* /78.82** | 0.84* / 0.65** | 0.35* / 0.63** | https://ommer-lab.com/files/latent-diffusion/cin.zip | *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by [ADM](https://github.com/openai/guided-diffusion) | |
| 49 | +| Conceptual Captions | Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79 | 13.89 | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/text2img.zip | finetuned from LAION | |
| 50 | +| OpenImages | Super-resolution | N/A | N/A | N/A | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip | BSR image degradation | |
| 51 | +| OpenImages | Layout-to-Image Synthesis | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02 | 15.92 | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip | | |
| 52 | +| Landscapes (finetuned 512) | Semantic Image Synthesis | LDM-VQ-4 (100 DDIM steps, eta=1) | N/A | N/A | N/A | N/A | https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip | | |
| 53 | + |
| 54 | + |
| 55 | +### Get the models |
| 56 | + |
| 57 | +The LDMs listed above can jointly be downloaded and extracted via |
| 58 | + |
| 59 | +```shell script |
| 60 | +bash scripts/download_models.sh |
| 61 | +``` |
| 62 | + |
| 63 | +The models can then be found in `models/ldm/<model_spec>`. |
| 64 | + |
| 65 | +### Sampling with unconditional models |
| 66 | + |
| 67 | +We provide a first script for sampling from our unconditional models. Start it via |
| 68 | + |
| 69 | +```shell script |
| 70 | +CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> |
| 71 | +``` |
| 72 | + |
| 73 | +# Inpainting |
| 74 | + |
| 75 | + |
| 76 | +Download the pre-trained weights |
| 77 | +``` |
| 78 | +wget XXX |
| 79 | +``` |
| 80 | + |
| 81 | +and sample with |
| 82 | +``` |
| 83 | +python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results |
| 84 | +``` |
| 85 | +`indir` should contain images `*.png` and masks `<image_fname>_mask.png` like |
| 86 | +the examples provided in `data/inpainting_examples`. |
| 87 | + |
| 88 | + |
| 89 | +## Comin Soon... |
| 90 | + |
| 91 | +* Code for training LDMs and the corresponding compression models. |
| 92 | +* Inference scripts for conditional LDMs for various conditioning modalities. |
| 93 | +* In the meantime, you can play with our colab notebook https://colab.research.google.com/drive/1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing |
| 94 | +* We will also release some further pretrained models. |
| 95 | +## Comments |
| 96 | + |
| 97 | +- Our codebase for the diffusion models builds heavily on [OpenAI's codebase](https://github.com/openai/guided-diffusion) |
| 98 | +and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). |
| 99 | +Thanks for open-sourcing! |
| 100 | + |
| 101 | +- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). |
| 102 | + |
| 103 | + |
3 | 104 |
|
4 |
| -...coming soon™ |
|
0 commit comments