TL;DR: A finetuning-based approach for style-specific text-to-image generation that ensures robust style consistency and aligns textual semantics.
conda create -n styleblend python=3.10
conda activate styleblend
pip install -r requirements.txt
- Specify the
configs/training_config_sd21.yaml
file to adjust training parameters as needed. - Specify the
configs/inference_config_sd21.yaml
file to adjust inference parameters.
The parameters.json
file provides additional parameters for example styles used during inference. These parameters are configured in the styleblend_sd.ipynb
inference script.
Run style_learning.ipynb
step by step to capture composition and texture styles.
- Organize your style images
- Name your style and create a folder
./data/[YOUR_STYLE_NAME]
to store the images. - Rename your style images. Use one or a few words to describe the content of each image, and name the style image in the format
[DESCRIPTION].png
. If there are multiple words, use underscores to replace spaces, for example[DESC1_DESC2].png
.
- Name your style and create a folder
- Run
style_learning.ipynb
step by step.
Run styleblend_sd.ipynb
step by step for inference.
@misc{chen2025styleblend,
title={StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image Diffusion Models},
author={Zichong Chen and Shijin Wang and Yang Zhou},
year={2025},
eprint={2502.09064},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.09064},
}