Skip to content

wad3birch/AutoBench-V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

📖 Framework Overview

pie

autobench-v_09

📚 File Structure

.
├── README.md
├── config
├── document
├── figure
├── process
├── requirements.txt
├── run_script.sh
└── tools

config: Including prompt to use and parameters to set, etc.

document: model's final performance, examiner priority, and position bias.

figure: figures used in paper

process: code of AutoBench-V

tools: Some common tools, such as image base64 conversion, data visualization and so on.

run_script.sh: api to use.

📕 Usage

pip -r install requirements.txt
./run_script.sh
python pipeline.py

Remember to change parameters: user_input and generate_type when run pipeline.py.

five options for user_input:

  • basic_understanding
  • spatial_understanding
  • semantic_understanding
  • reasoning_capacity
  • atmospheric_understanding

For a complete pipeline, you only need to use 7 kinds for generate_type in order:

  • aspect: generate aspects
  • guideline : generate guidelines
  • prompts: generate image descriptions
  • images: generate images based on description
  • alignment: test the alignment of images and descriptions via VQA
  • questions: generate questions to test LVLMs
  • answers: answer questions and score

🔎 Cite AutoBench-V

@misc{bao2024autobenchvlargevisionlanguagemodels,
      title={AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?}, 
      author={Han Bao and Yue Huang and Yanbo Wang and Jiayi Ye and Xiangqi Wang and Xiuying Chen and Mohamed Elhoseiny and Xiangliang Zhang},
      year={2024},
      eprint={2410.21259},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.21259}, 
}

📬 Contact

If you have any questions, suggestions, or would like to collaborate, please feel free to reach out to us via email at wad3ahhh@gmail.com

About

An automated framework for benchmarking LVLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published