.
├── README.md
├── config
├── document
├── figure
├── process
├── requirements.txt
├── run_script.sh
└── tools
config
: Including prompt to use and parameters to set, etc.
document
: model's final performance, examiner priority, and position bias.
figure
: figures used in paper
process
: code of AutoBench-V
tools
: Some common tools, such as image base64 conversion, data visualization and so on.
run_script.sh
: api to use.
pip -r install requirements.txt
./run_script.sh
python pipeline.py
Remember to change parameters: user_input
and generate_type
when run pipeline.py
.
five options for user_input
:
basic_understanding
spatial_understanding
semantic_understanding
reasoning_capacity
atmospheric_understanding
For a complete pipeline, you only need to use 7 kinds for generate_type
in order:
aspect
: generate aspectsguideline
: generate guidelinesprompts
: generate image descriptionsimages
: generate images based on descriptionalignment
: test the alignment of images and descriptions via VQAquestions
: generate questions to test LVLMsanswers
: answer questions and score
@misc{bao2024autobenchvlargevisionlanguagemodels,
title={AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?},
author={Han Bao and Yue Huang and Yanbo Wang and Jiayi Ye and Xiangqi Wang and Xiuying Chen and Mohamed Elhoseiny and Xiangliang Zhang},
year={2024},
eprint={2410.21259},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.21259},
}
If you have any questions, suggestions, or would like to collaborate, please feel free to reach out to us via email at wad3ahhh@gmail.com