[SCIS2024] The official implementation of paper "Modality-experts coordinated adaptation for large multimodal models", by Yan Zhang, Zhong Ji, Yanwei Pang, Jungong Han, Xuelong Li. It is built on top of the LAVIS in PyTorch. The paper link is there.
Follow the Instructions to create environment.
The common vision-language datasets could be downloaded by automatic download tools, which could be employed to organize these datasets.
Then, modify the corresponding path in configs and the default.yaml.
Runing the scripts in run_scripts for training and evaluation.
For more details and advanced usages, please refer to documentation.
@article{:/publisher/Science China Press/journal/SCIENCE CHINA Information Sciences/67/12/10.1007/s11432-024-4234-4,
author = "Yan ZHANG,Zhong JI,Yanwei PANG,Jungong HAN,Xuelong LI",
title = "Modality-experts coordinated adaptation for large multimodal models",
journal = "SCIENCE CHINA Information Sciences",
year = "2024",
volume = "67",
number = "12",
pages = "220107-",
url = "http://www.sciengine.com/publisher/Science China Press/journal/SCIENCE CHINA Information Sciences/67/12/10.1007/s11432-024-4234-4,
doi = "https://doi.org/10.1007/s11432-024-4234-4"
}
Our codebase is built based on the popular LAVIS repository, which is under BSD 3-Clause License.