Yuhao Dong*,1  Zuyan Liu*,2,3  Hai-Long Sun2,4  Jingkang Yang1 
Winston Hu2  Yongming Rao2,3,✉  Ziwei Liu1,✉ 
1S-Lab, NTU   2Tencent  3Tsinghua University 4Nanjing University 
* Equal Contribution  ✉ Corresponding Author
- [04/2025] Insight-V is selected as Highlight paper by CVPR2025!
- [02/2025] Insight-V is accepted by CVPR2025!
- [11/2024] 🔧🔨Training & Inference Scripts Release! Try Insight-V on your own!
- [11/2024] 🔥 🚀Introducing Insight-V! An early attempt to explore long-chain visual reasoning with MLLMs.
- [Paper]: Detailed introduction of Insight-V, including structured, long-chain data generation pipeline and effective multi-agent system design!
- [Checkpoints]: We release model checkpoints on LLaVA-NeXT-LLaMA3 and our base model.
Insight-V is an early effort to explore long-chain visual reasoning with MLLMs.
Insight-V offers 1) a scalable data generation pipeline for long-chain, high-quality reasoning data, 2) a multi-agent system that decomposes visual reasoning tasks into reasoning and summarization, and 3) a two-stage training pipeline to enhance visual reasoning capabilities. Together, these contributions address key challenges in visual reasoning, providing a solid foundation for future research in MLLM reasoning.
The reasoning processes are generated progressively through a reasoning generator, and then fed into a multi-granularity assessment system to ensure high-quality reasoning.
We derive a multi-agent system from a single model. By decomposing the task into reasoning and summarization, the two agents collaborate to enhance the overall reasoning capability.
- Release paper on arXiv
- Release Insight-V models.
- Demo code for generation.
- All the training and inference code.
- Evaluation code for visual reasoning benchmarks.
- Insight-V SFT Data.
- Insight-V with stronger MLLMs.
If you find it useful for your research and applications, please cite our paper using this BibTeX:
@article{dong2024insight,
title={Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models},
author={Dong, Yuhao and Liu, Zuyan and Sun, Hai-Long and Yang, Jingkang and Hu, Winston and Rao, Yongming and Liu, Ziwei},
journal={arXiv preprint arXiv:2411.14432},
year={2024}
}