Edufusion: Stable Diffusion 1.5 for Educational Purposes

Edufusion is a DIY implementation of Stable Diffusion 1.5 with minimal dependencies, designed for educational purposes.

This repository provides only the core components of Stable Diffusion 1.5 (VAE, CLIP Tokenizer, CLIP Text Encoder and UNet) along with their corresponding weights. The implementation of the sampling loop is left as an exercise for learners.

The goals of this project are:

Foster a deep understanding of Diffusion Models by challenging users to implement sampling methods using the provided pre-trained LDM components.
Provide a minimal implementation of LDM, enabling learners to grasp the functionality of each component without navigating complex dependency trees.
Enable users to easily conduct hands-on experimentation with diffusion models to deepen their understanding and to potentially contribute new innovations to the field.

Sample usage

The package can be installed like this:

git clone https://github.com/AndranikSargsyan/Edufusion
cd Edufusion
pip install -e .

Then download the model weights by executing the following commands:

mkdir -p models
wget -ncv --show-progress -O models/text-encoder-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/text-encoder-sd-v1-5-fp16.pt?download=true
wget -ncv --show-progress -O models/vae-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/vae-sd-v1-5-fp16.pt?download=true
wget -ncv --show-progress -O models/unet-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/unet-sd-v1-5-fp16.pt?download=true

After the installation the components can be used as demonstrated in the following sections.

Variational AutoEncoder (VAE)

In Stable Diffusion, the VAE helps in encoding high-dimensional images into a lower-dimensional latent space, reducing the complexity for UNet processing.

import torch
import torchvision.transforms.functional as tvF
from PIL import Image

from edufusion import AutoencoderKL

vae = AutoencoderKL().to(device="cuda").half()
vae.load_state_dict(torch.load("path/to/vae-sd-v1-5-fp16.pt", weights_only=True))
img = Image.open("path/to/image.jpg").resize((512, 512))
img_tensor = 2.0 * tvF.pil_to_tensor(img).unsqueeze(0) / 255 - 1.0
img_tensor = img_tensor.to("cuda").half()
with torch.no_grad():
    z0 = vae.encode(img_tensor).mean
    x = vae.decode(z0)
reconstructed_img = tvF.to_pil_image((x[0] / 2 + 0.5).clip(0, 1))

CLIP Text Encoder

The CLIP Text Encoder in Stable Diffusion converts text prompts into vector representations, which guide the image generation process.

from edufusion import FrozenCLIPEmbedder
text_encoder = FrozenCLIPEmbedder()
text_encoder = text_encoder.to(device="cuda").half()
text_encoder.load_state_dict(torch.load("path/to/text-encoder-sd-v1-5-fp16.pt", weights_only=True))
encoded_text = text_encoder.encode(['The quick brown fox jumps over the lazy dog'])
print(encoded_text.shape)

UNet

The UNet is the denoiser network in DDPM theory.

from edufusion import UNetModel
unet = UNetModel().to(device="cuda").half()
unet.load_state_dict(torch.load("./models/unet-sd-v1-5-fp16.pt", weights_only=True))
print("Total number of UNet parameters:", sum(p.numel() for p in unet.parameters()))

Tasks to DIY

Verify the reconstruction ability of the provided VAE,
Implement DDPM reverse process with classifier-free guidance,
Implement DDIM reverse process with classifier-free guidance,
Implement deterministic DDIM forward process and perform DDIM inversion to verify the reconstruction quality,
Perform SDEdit-like image editing,
Implement Blended Latent Diffusion inpainting method,
...

Acknowledgements

The code in this repository is based on stablediffusion, CLIP and 🤗 Transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
edufusion		edufusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edufusion: Stable Diffusion 1.5 for Educational Purposes

Sample usage

Variational AutoEncoder (VAE)

CLIP Text Encoder

UNet

Tasks to DIY

Acknowledgements

About

Releases

Packages

Languages

License

AndranikSargsyan/Edufusion

Folders and files

Latest commit

History

Repository files navigation

Edufusion: Stable Diffusion 1.5 for Educational Purposes

Sample usage

Variational AutoEncoder (VAE)

CLIP Text Encoder

UNet

Tasks to DIY

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages