ContextDiff/ContextDiff_image at main · YangLing0818/ContextDiff

History

Name		Name	Last commit message	Last commit date
parent directory ..
diffusers		diffusers
model_lib		model_lib
LICENSE		LICENSE
README.md		README.md
finetune_diffusion.py		finetune_diffusion.py
init.sh		init.sh
large_empty_embeds.th		large_empty_embeds.th
process_img.py		process_img.py
small_baseline_empty_embeds.th		small_baseline_empty_embeds.th
small_sft_empty_embeds.th		small_sft_empty_embeds.th
train_adapter.py		train_adapter.py

README.md

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing

This repository contains the official implementation of text-to-image part in ContextDiff. Here, we only provide a sample code on COCO dataset for simplicity, and you may change to any datasets to apply our method.

Preparations

Environment Setup

git clone https://github.com/YangLing0818/ContextDiff.git
conda create -n ContextDiff python==3.8
pip install -r requirements.txt
cd ContextDiff_image
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/huggingface/diffusers

Download Model Weights

Here we choose Stable Diffusion as our diffusion backbone, you can download the model weights using our download.py in folder 'ckpt/'.

cd ckpt
python download.py 
wget "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt"
wget "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"
wget "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt"
cd ..

Download Datasets

cd datasets
wget http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip
cd ..
python process_img.py --src=./dataset/train2017 --size=512 --dest=./dataset/train2017

Train Context-Aware Adapter

CUDA_VISIBLE_DEVICES=0 python train_adapter.py --train_data_dir './dataset/train2017' --mixed_precision 'fp16' --output_dir 'output/' --train_batch_size 64 --num_train_epochs 20 --checkpointing_steps 10000 "--t5_model" 'path to text encoders'

You can check the code for details, and choose hyper-parameters based on your device.

Finetune Diffusion Model with Context-Aware Adapter

CUDA_VISIBLE_DEVICES=0 finetune_diffusion.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --train_data_dir=./train2017 --use_ema --resolution=512 --center_crop --random_flip --train_batch_size=32 --gradient_accumulation_steps=1 --gradient_checkpointing --max_train_steps=50000 --checkpointing_steps=10000 --learning_rate=2e-05 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 
--output_dir="./output"

For the '--mean_path' and '--std_path' in the code, it is generated from the dataset embeddings. You can use cluster method like GMM to obtain std and mean from your datasets. This method could help to accelerate the speed of training convergence by optimizing denoising starting point from pure Gaussian distribution to the image distribution of the training dataset. You can also directly create means and variances that conform to an isotropic Gaussian distribution.

Citation

@inproceedings{
yang2024crossmodal,
title={Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing},
author={Ling Yang and Zhilong Zhang and Zhaochen Yu and Jingwei Liu and Minkai Xu and Stefano Ermon and Bin CUI},
booktitle={International Conference on Learning Representations},
year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContextDiff_image

ContextDiff_image

README.md

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing

Preparations

Citation

Files

ContextDiff_image

Directory actions

More options

Directory actions

More options

Latest commit

History

ContextDiff_image

Folders and files

parent directory

README.md

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing

Preparations

Citation