Skip to content

DIY Stable Diffusion 1.5 implementation with minimal dependencies.

License

Notifications You must be signed in to change notification settings

AndranikSargsyan/Edufusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Edufusion: Stable Diffusion 1.5 for Educational Purposes

Edufusion is a DIY implementation of Stable Diffusion 1.5 with minimal dependencies, designed for educational purposes.

This repository provides only the core components of Stable Diffusion 1.5 (VAE, CLIP Tokenizer, CLIP Text Encoder and UNet) along with their corresponding weights. The implementation of the sampling loop is left as an exercise for learners.

The goals of this project are:

  • Foster a deep understanding of Diffusion Models by challenging users to implement sampling methods using the provided pre-trained LDM components.
  • Provide a minimal implementation of LDM, enabling learners to grasp the functionality of each component without navigating complex dependency trees.
  • Enable users to easily conduct hands-on experimentation with diffusion models to deepen their understanding and to potentially contribute new innovations to the field.

Sample usage

The package can be installed like this:

git clone https://github.com/AndranikSargsyan/Edufusion
cd Edufusion
pip install -e .

Then download the model weights by executing the following commands:

mkdir -p models
wget -ncv --show-progress -O models/text-encoder-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/text-encoder-sd-v1-5-fp16.pt?download=true
wget -ncv --show-progress -O models/vae-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/vae-sd-v1-5-fp16.pt?download=true
wget -ncv --show-progress -O models/unet-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/unet-sd-v1-5-fp16.pt?download=true

After the installation the components can be used as demonstrated in the following sections.

Variational AutoEncoder (VAE)

In Stable Diffusion, the VAE helps in encoding high-dimensional images into a lower-dimensional latent space, reducing the complexity for UNet processing.

import torch
import torchvision.transforms.functional as tvF
from PIL import Image

from edufusion import AutoencoderKL

vae = AutoencoderKL().to(device="cuda").half()
vae.load_state_dict(torch.load("path/to/vae-sd-v1-5-fp16.pt", weights_only=True))
img = Image.open("path/to/image.jpg").resize((512, 512))
img_tensor = 2.0 * tvF.pil_to_tensor(img).unsqueeze(0) / 255 - 1.0
img_tensor = img_tensor.to("cuda").half()
with torch.no_grad():
    z0 = vae.encode(img_tensor).mean
    x = vae.decode(z0)
reconstructed_img = tvF.to_pil_image((x[0] / 2 + 0.5).clip(0, 1))

CLIP Text Encoder

The CLIP Text Encoder in Stable Diffusion converts text prompts into vector representations, which guide the image generation process.

from edufusion import FrozenCLIPEmbedder
text_encoder = FrozenCLIPEmbedder()
text_encoder = text_encoder.to(device="cuda").half()
text_encoder.load_state_dict(torch.load("path/to/text-encoder-sd-v1-5-fp16.pt", weights_only=True))
encoded_text = text_encoder.encode(['The quick brown fox jumps over the lazy dog'])
print(encoded_text.shape)

UNet

The UNet is the denoiser network in DDPM theory.

from edufusion import UNetModel
unet = UNetModel().to(device="cuda").half()
unet.load_state_dict(torch.load("./models/unet-sd-v1-5-fp16.pt", weights_only=True))
print("Total number of UNet parameters:", sum(p.numel() for p in unet.parameters()))

Tasks to DIY

  • Verify the reconstruction ability of the provided VAE,
  • Implement DDPM reverse process with classifier-free guidance,
  • Implement DDIM reverse process with classifier-free guidance,
  • Implement deterministic DDIM forward process and perform DDIM inversion to verify the reconstruction quality,
  • Perform SDEdit-like image editing,
  • Implement Blended Latent Diffusion inpainting method,
  • ...

Acknowledgements

The code in this repository is based on stablediffusion, CLIP and 🤗 Transformers.

About

DIY Stable Diffusion 1.5 implementation with minimal dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages