Fei Gao*, Yuhao Lin*, Jiaqi Shi, Maoying Qiao, Nannan Wang**, AesMamba: Universal Image Aesthetic Assessment with State Space Models, Proceedings of the 32nd ACM International Conference on Multimedia, 7444–7453, 2024.
- Add inference code and config files
- Add checkpoint and script for IAA task
requirements:
- Linux
- NVIDIA GPU
- PyTorch 1.12+
- CUDA 11.6+
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install --upgrade pip setuptools wheel
conda create -n Aesmamba python=3.8
conda activate Aesmamba
git clone https://github.com/state-spaces/mamba.git
cd mamba
MAMBA_FORCE_BUILD=TRUE pip install .
cd ../Aesmamba
pip install -r requirements.txt
cd AesMamba_v && python train_viaa.py
cd AesMamba_m && python train_miaa.py
cd AesMamba_f && python train_multi_attr_add_balce.py
cd AesMamba_p && python multi_attr_pred_model_add_human_attr.py.py
You can change the config in their corresponding .py file. We will combine the four tasks in our later works.
In our code, we classified the image by its score in each dataset. We uploaded some of their csv files. As for other datasets, we only provide the method of classification because the csv file is large.
@inproceedings{Gao2024AesMamba,
author = {Gao, Fei and Lin, Yuhao and Shi, Jiaqi and Qiao, Maoying and Wang, Nannan},
title = {AesMamba: Universal Image Aesthetic Assessment with State Space Models},
booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia},
pages = {7444–7453},
location = {Melbourne VIC, Australia},
year = {2024},
address = {New York, NY, USA},
doi = {10.1145/3664647.3681011}
}
Visual Encoder:vmamba tiny and Text Encoder:bert base We use old version of vmamba, the ckpt is here:
Link: https://pan.baidu.com/s/1REVTVD4w20G7lKnIM-Btjg Passward: c1mk
Vmamba base and it's conda environment please ref https://github.com/MzeroMiko/VMamba