VLog

VLog aims to seek a new perspective on video-language understanding.

👇 Click the branch to see more instructions.

	VLog (CVPR'25)	VLog-Agent
TL;DR	Video Narration as Vocabulary	Video as Long Document

Method	A novel, efficient video narrator (GPT2-based) with Narration Vocabulary via Generative Retrieval.	Given a video, we turn it into a textual document containing visual + audio info. By sending this doc to LLM, we can chat over the video!

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{lin2025vlog,
      title={VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary}, 
      author={Kevin Qinghong Lin and Mike Zheng Shou},
      year={2025},
      eprint={2503.09402},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.09402}, 
}

Name	Name	Last commit message	Last commit date
Latest commit QinghongLin update Mar 13, 2025 86d31dd · Mar 13, 2025 History 25 Commits
VLog-agent	VLog-agent	update	Mar 11, 2025
VLog	VLog	update	Mar 13, 2025
README.md	README.md	update	Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLog

🎓 BibTeX

About

Releases

Packages

Contributors 2

Languages

showlab/VLog

Folders and files

Latest commit

History

Repository files navigation

VLog

🎓 BibTeX

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages