Text-Audio Retrieval

This repository provides the implementation of a dual-encoder model for Text-Audio Cross-Modal Retrieval using PaSST (audio encoder) and RoBERTa (text encoder).

Quick Start Guide

This repository is developed with Python 3.9 and PyTorch 1.13.1.

Check out source code and install required python libraries.

git clone https://github.com/xieh97/text-audio-retrieval.git
pip install -r requirements.txt

Download audio-caption datasets. Note: AudioCaps is sourced from YouTube, and some of the original videos are no longer available.

Dataset	Train	Validation	Test	Link
AudioCaps	48548	380	940	GitHub
Clotho	19195	1045	1045	Zenodo
WavCaps	401195	N/A	N/A	Hugging Face

Preprocess audio and caption data.

preprocess
├─ audiocaps.py                 # generate AudioCaps caption embeddings
├─ clothov2.py                  # generate Clotho caption embeddings
├─ wavcaps.py                   # generate WavCaps mp3 audio files
└─ wavcaps2.py                  # generate WavCaps caption embeddings

Train the model.

datasets
├─ audioset.py                  # load WavCaps (AudioSet)
├─ audio_caps.py                # load AudioCaps
├─ clotho_v2.py                 # load Clotho
├─ dataset_base_classes.py      # cache data
├─ wavcaps.py                   # load WavCaps
└─ __init__.py

utils
├─ criterion_utils.py           # cross-entropy losses
├─ data_utils.py                # load datasets
├─ directories.py               # dataset and cache directories
├─ model_utils.py               # model.train(), model.test(), etc.
└─ optim_utils.py               # learning rate schedulers

data_loader.py                  # Pytorch dataloaders
models.py                       # dual-encoder models
ex_baseline.py                  # main()

Attribution and Acknowledgment

This repository contains code adapted from Estimated Audio–Caption Correspondences Improve Language-Based Audio Retrieval. Changes have been made to the original code to suit the specific requirements of this project. Special thanks to Authors of [1] and [2] for their contribution to the open-source community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Audio Retrieval

Quick Start Guide

Attribution and Acknowledgment

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datasets		datasets
preprocess		preprocess
utils		utils
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
ex_baseline.py		ex_baseline.py
models.py		models.py
requirements.txt		requirements.txt

License

xieh97/text-audio-retrieval

Folders and files

Latest commit

History

Repository files navigation

Text-Audio Retrieval

Quick Start Guide

Attribution and Acknowledgment

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages