GitHub - bowen-upenn/ControlText: ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations

This is the official implementation of the paper ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations in PyTorch.

✨ Overview

Visual text rendering is a challenging task, especially when precise font control is desired. This work demonstrates that diffusion models can achieve font-controllable multilingual text rendering using just raw images without font label annotations.

🚀 Key Takeaways

Font controls require no font label annotations:
A text segmentation model can capture nuanced font information in pixel space without requiring font label annotations in the dataset, enabling zero- shot generation on unseen languages and fonts, as well as scalable training on web-scale image datasets as long as they contain text.
Evaluating ambiguous fonts in the open world: Fuzzy font accuracy can be measured in the embed- ding space of a pretrained font classification model, utilizing our proposed metrics l2@k and cos@k.
Supporting user-driven design flexibility:
Random perturbations can be applied to segmented glyphs. While this won’t affect the rendered text quality, it accounts for users not precisely aligning text to best locations and prevents models from rigidly replicating the pixel locations in glyphs.
Working with foundation models:
With limited computational resources, we can still copilot foundational image generation models to perform localized text and font editing.

Citation

If you find our work inspires you, please consider citing it. Thank you!

@article{jiang2025controltext,
  title={ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations},
  author={Jiang, Bowen and Yuan, Yuan and Bai, Xinyi and Hao, Zhuoqun and Yin, Alyson and Hu, Yaojie and Liao, Wenyu and Ungar, Lyle and Taylor, Camillo J},
  journal={arXiv preprint arXiv:2502.10999},
  year={2025}
}

🔧 How to Train

Our repository is based on the code of AnyText. We build upon and extend it to enable user-controllable fonts in zero-shot. Below is a brief walkthrough:

Prerequisites: We use conda environment to manage all required packages.
```
conda env create -f environment.yml
conda activate controltext
```
Preprocess Glyphs:
Configuration:
- Adjust hyperparameters such as batch_size, grad_accum, learning_rate, logger_freq, and max_epochs in the training script train.py. Please keep mask_ratio = 1.
- Set paths for GPUs, checkpoints, model configuration file, image datasets, and preprocessed glyphs accordingly.
Training Command: Run the training script:
```
python train.py
```

🔮 Inference & Front-End

The front-end code for user-friendly text and font editing are coming soon! Stay tuned for updates as we continue to enhance the project.

👩‍💻 Evaluation

Our Generated Data

laion_controltext Google Drive, laion_controltext_gly_lines (cropped regions for each line of text from the entire image) Google Drive, laion_controltext_gly_lines_grayscale (laion_controltext_gly_lines after text segmentation) Google Drive, laion_gly_lines_gt (cropped regions from input glyphs after text segmentation) Google Drive

wukong_controltext Google Drive, wukong_controltext_gly_line Google Drive, wukong_controltext_glylines_grayscale Google Drive, wukong_gly_lines_gt Google Drive
Our Model Checkpoint

Google Drive
Script for evaluating text accuracy:

Run the following script to calculate SenACC and NED scores for text accuracy, which will evaluate laion_controltext_gly_lines and wukong_controltext_gly_line.
```
bash eval/eval_dgocr.sh
```
Run the following script to calculate FID score for overall image quality, which will evaluate laion_controltext and wukong_controltext.
```
bash eval/eval_fid.sh
```
Script for evaluating font accuracy in the open world:

Run the following script to calculate the font accuracy
```
bash eval/eval_font.sh --generated_folder path/to/your/generated_folder --gt_folder path/to/your/gt_folder
```
In the argument, path/to/your/generated_folder should point to the directory containing your generated images, for example, laion_controltext_gly_lines_grayscale or wukong_controltext_glylines_grayscale. Similarly, path/to/your/gt_folder should refer to the directory containing the ground-truth glyph images or the segmented glyphs used as input conditions, where we use laion_gly_lines_gt or wukong_gly_lines_gt.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
Rethinking-Text-Segmentation		Rethinking-Text-Segmentation
cldm		cldm
docs		docs
eval		eval
example_images		example_images
javascript		javascript
ldm		ldm
models_yaml		models_yaml
ocr_recog		ocr_recog
ocr_weights		ocr_weights
synthetic_dataset		synthetic_dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
banner.png		banner.png
bert_tokenizer.py		bert_tokenizer.py
dataset_util.py		dataset_util.py
demo.py		demo.py
environment.yml		environment.yml
flows.png		flows.png
inference.py		inference.py
lora_util.py		lora_util.py
preprocess_conditions.py		preprocess_conditions.py
proj_3d_surface.py		proj_3d_surface.py
requirements.txt		requirements.txt
style.css		style.css
t3_dataset.py		t3_dataset.py
tool_add_anytext.py		tool_add_anytext.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is the official implementation of the paper ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations in PyTorch.

✨ Overview

🚀 Key Takeaways

Citation

🔧 How to Train

🔮 Inference & Front-End

👩‍💻 Evaluation

About

Releases

Packages

Languages

License

bowen-upenn/ControlText

Folders and files

Latest commit

History

Repository files navigation

This is the official implementation of the paper ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations in PyTorch.

✨ Overview

🚀 Key Takeaways

Citation

🔧 How to Train

🔮 Inference & Front-End

👩‍💻 Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages