Skip to content

zhangguiwei610/V2Flow

Repository files navigation

V²Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation

📖 Introduction

V²Flow introduces an advanced vector-quantized image tokenizer designed to seamlessly integrate visual tokenization with existing large language model (LLM) vocabularies. By aligning structural representations and latent distributions between image tokens and textual tokens, V²Flow enables effective autoregressive image generation leveraging pre-trained LLMs.

✨ Highlights

1. Structural and Latent Distribution Alignment with LLM's Vocabulary:

2. Masked Autoregressive Reconstruction from a Flow-matching Perspective:

3. Autoregressive Visual Generation on Top of Existing LLMs:

🧩 Project Updates

  • 2025-03-30: Release of the complete training and inference codebase for V²Flow. Pretrained models (1024x1024 and 512x512 resolutions) will be available shortly.
  • 2025-03-10: V²Flow is released on arXiv.

🚀 Training & Inference

V²Flow Tokenizer

The complete data preparation, training, and inference instructions for the V²Flow tokenizer can be found here.

🚀 Open-source Plan

  • V²Flow tokenizer
    • Training and inference codes
    • Checkpoints
    • Gradio Demo
  • V²Flow+LLaMA for Autoregressive Visual Generation
    • Training and inference codes
    • Checkpoints
    • Gradio Demo

Acknowledgement

We thank the great work from MAR, LLaVA and VideoLLaMA

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published