LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport

This repository provides the pytorch source code for ICASSP 2025 paper LAVCap, optimized for the use with Intel Gaudi-v2 accelerator.

You can find the CUDA-compatible version of LAVCap in this repository

Prerequisites

1. Download pre-trained LLaMA-2

LAVCap leverages the llama-2-7b-chat-hf variant of the LLaMA-2 model as its foundational backbone. You can download the model from here.

2. Download AudioCaps dataset

The AudioCaps dataset is used for training and evaluation. You can download the dataset from here.

3. Create a Docker container

Run the following command to create a Docker container.

bash ./sllm_docker.sh

For more details, refer to the Intel Gaudi documentation

4. Install dependencies

Install the necessary dependencies by running:

pip install -r ./requirements.txt

5. Set the configuration

The configuration for training is specified in the ./configs/lavcap.yaml file. Modify this file as needed before proceeding.

How to train

Single-node training

To train the model on a single node, run:

PT_HPU_LAZY_MODE=0 python train.py --cfg-path configs/lavcap.yaml

Multi-node training

To train on multiple nodes, run:

PT_HPU_LAZY_MODE=0 python gaudi_spawn.py --world_size 8 --use_mpi train.py --cfg-path configs/lavcap.yaml

How to test

To test the mode using a trained model checkpoint (the result of the above training process), replace /path/to/ckpt with the actual path to your checkpoint, and run

PT_HPU_LAZY_MODE=0 python train.py --cfg-path configs/lavcap.yaml --options run.do_eval=True model.resume_from=/path/to/ckpt

Acknowledgement

This project was developed with support from the NAVER-Intel Co-Lab.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
dataset		dataset
loss		loss
models		models
prompts		prompts
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
audiocaps		audiocaps
coco_eval.py		coco_eval.py
config.py		config.py
eval.sh		eval.sh
gaudi_spawn.py		gaudi_spawn.py
requirements.txt		requirements.txt
results		results
sllm_docker.sh		sllm_docker.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport

Prerequisites

1. Download pre-trained LLaMA-2

2. Download AudioCaps dataset

3. Create a Docker container

4. Install dependencies

5. Set the configuration

How to train

Single-node training

Multi-node training

How to test

Acknowledgement

About

Releases

Packages

Languages

License

NAVER-INTEL-Co-Lab/gaudi-lavcap

Folders and files

Latest commit

History

Repository files navigation

LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport

Prerequisites

1. Download pre-trained LLaMA-2

2. Download AudioCaps dataset

3. Create a Docker container

4. Install dependencies

5. Set the configuration

How to train

Single-node training

Multi-node training

How to test

Acknowledgement

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages