This repository provides the pytorch source code for ICASSP 2025 paper LAVCap, optimized for the use with Intel Gaudi-v2 accelerator.
You can find the CUDA-compatible version of LAVCap in this repository
LAVCap leverages the llama-2-7b-chat-hf
variant of the LLaMA-2 model as its foundational backbone. You can download the model from here.
The AudioCaps
dataset is used for training and evaluation. You can download the dataset from here.
Run the following command to create a Docker container.
bash ./sllm_docker.sh
For more details, refer to the Intel Gaudi documentation
Install the necessary dependencies by running:
pip install -r ./requirements.txt
The configuration for training is specified in the ./configs/lavcap.yaml
file. Modify this file as needed before proceeding.
To train the model on a single node, run:
PT_HPU_LAZY_MODE=0 python train.py --cfg-path configs/lavcap.yaml
To train on multiple nodes, run:
PT_HPU_LAZY_MODE=0 python gaudi_spawn.py --world_size 8 --use_mpi train.py --cfg-path configs/lavcap.yaml
To test the mode using a trained model checkpoint (the result of the above training process), replace /path/to/ckpt with the actual path to your checkpoint, and run
PT_HPU_LAZY_MODE=0 python train.py --cfg-path configs/lavcap.yaml --options run.do_eval=True model.resume_from=/path/to/ckpt
- This project was developed with support from the NAVER-Intel Co-Lab.