GitHub - MAC-AutoML/QuoTA: This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

😮 Highlights

We design a versatile plug-and-play pipeline for existing LVLMs: QuoTA provides a training-free solution applicable to diverse LVLMs, enhancing long video understanding performance by assigning visual tokens based on text instruction (query) relevance. This approach offers a more elegant and direct methodology compared to conventional attention-based analytical techniques.
We propose CoT-driven query decouple for query-oriented frame scoring: QuoTA employs Chain-of-Thoughts to decouple query into a specific-designed question, enabling high-quality scoring of video frames.
Our QuoTA setting a new state-of-the-art: Integration of QuoTA with LLaVA-Video-7B yields a 3.2% average performance improvement across six benchmarks, achieving the best results in five video benchmarks, including Video-MME and MLVU, among 7B LVLMs.

🔨 Usage

This repo is built upon LLaVA-NeXT:

Step 1: Clone and build LLaVA-NeXT conda environment, then install the following packages in llava envs:

git clone https://github.com/LLaVA-VL/LLaVA-NeXT
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"
# install qwen toolkit
pip install qwen-vl-utils

Step 2: Replace the file under LLaVA-NeXT/llava/model/llava_arch.py with core/llava_arch.py:
Step 3: Copy the file core/merge.py under LLaVA-NeXT/llava/model/
Step 4: Move all our code (tools/ and quota_pipeline.py) under the root dir (LLaVA-NeXT) of LLaVA-NeXT
Step 5: You can now run our pipeline build upon LLaVA-Video-7B by:

python quota_pipeline.py

Note that you can also use our pipeline for other LVLMs.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝:

@article{luo2025quota,
  title={QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension},
  author={Luo, Yongdong and Chen, Wang and Zheng, Xiawu and Huang, Weizhong and Yin, Shukang and Lin, Haojia and Fu, Chaoyou and Huang, Jinfa and Ji, Jiayi and Luo, Jiebo and others},
  journal={arXiv preprint arXiv:2503.08689},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
core		core
tools		tools
QuoTA_paper.pdf		QuoTA_paper.pdf
quota_pipeline.py		quota_pipeline.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

😮 Highlights

🔨 Usage

✏️ Citation

About

Releases

Packages

Languages

MAC-AutoML/QuoTA

Folders and files

Latest commit

History

Repository files navigation

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

😮 Highlights

🔨 Usage

✏️ Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages