2AFC Prompting of Large Multimodal Models for Image Quality Assessment

Hanwei Zhu¹^*, Xiangjie Sui²^*, Baoliang Chen³, Xuelin Liu⁴, Peilin Chen¹, Yuming Fang⁴, Shiqi Wang¹^#

¹City University of Hong Kong ²City University of Macau, ³South China Normal University,

⁴Jiangxi University of Finance and Economics

^*Equal contribution. ^#Corresponding author.

While abundant research has been conducted on improving high-level visual understanding and reasoning capabilities of large multimodal models~(LMMs), their image quality assessment (IQA) ability has been relatively under-explored. Here we take initial steps towards this goal by employing the two-alternative forced choice (2AFC) prompting, as 2AFC is widely regarded as the most reliable way of collecting human opinions of visual quality. Subsequently, the global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posteriori estimation. Meanwhile, we introduce three evaluation criteria: consistency, accuracy, and correlation, to provide comprehensive quantifications and deeper insights into the IQA capability of five LMMs. Extensive experiments show that existing LMMs exhibit remarkable IQA ability on coarse-grained quality comparison, but there is room for improvement on fine-grained quality discrimination. The proposed dataset sheds light on the future development of IQA models based on LMMs.

💫 Sample Images

We assess four open-source and one close-source LMMs on both coarse-grained and fine-grained scenarios. The selected images contained in the corresponding JSON files.

Coarse-grained scenario (data/Coarse_grained_mixed.json):

Synthtic distortion: CSIQ, KADID-10k , MM21, KADIS-700
Realistic distortion: LIVE Challenge, KonIQ-10k, SPAQ, SQAD

Fine-grained scenario:

Synthtic distortion (Fine_grained_CSIQ_levels.json and Fine_grained_CSIQ_types.json): CSIQ
Realistic distortion (data/Fine_graind_SPAQ.json): SPAQ

🛠️ Quick Inference

Here is an example for GPT-4V.

python main.py --data_dir path/IQA_datasets --stage_name Fine_grain_SPAQ.json --model_name GPT4V

BibTeX

@article{zhu20242afc,
  title={2AFC Prompting of Large Multimodal Models for Image Quality Assessment},
  author={Zhu, Hanwei and Sui, Xiangjie and Chen, Baoliang and Liu, Xuelin and Fang, Yuming and Wang, Shiqi},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2024}
}

Personal Acknowledgement

I would like to thank Kede Ma for his insightful discussions and diligent efforts in revising the paper.

📧 Contact

If you have any question, please contact hanwei.zhu@my.cityu.edu.hk.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LMMs		LMMs
data		data
.DS_Store		.DS_Store
README.md		README.md
TwoAFC.py		TwoAFC.py
cal_acc_consistency.py		cal_acc_consistency.py
correlation.py		correlation.py
main.py		main.py
params.py		params.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2AFC Prompting of Large Multimodal Models for Image Quality Assessment

💫 Sample Images

🛠️ Quick Inference

BibTeX

Personal Acknowledgement

📧 Contact

About

Releases

Packages

Languages

h4nwei/2AFC-LMMs

Folders and files

Latest commit

History

Repository files navigation

2AFC Prompting of Large Multimodal Models for Image Quality Assessment

💫 Sample Images

🛠️ Quick Inference

BibTeX

Personal Acknowledgement

📧 Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages