Skip to content
/ VLog Public

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

Notifications You must be signed in to change notification settings

showlab/VLog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Mar 13, 2025
86d31dd Β· Mar 13, 2025

History

25 Commits
Mar 11, 2025
Mar 13, 2025
Mar 13, 2025

Repository files navigation

VLog aims to seek a new perspective on video-language understanding.

πŸ‘‡ Click the branch to see more instructions.

VLog (CVPR'25) VLog-Agent
TL;DR Video Narration as Vocabulary Video as Long Document
Method A novel, efficient video narrator (GPT2-based) with Narration Vocabulary via Generative Retrieval. Given a video, we turn it into a textual document containing visual + audio info. By sending this doc to LLM, we can chat over the video!

πŸŽ“ BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{lin2025vlog,
      title={VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary}, 
      author={Kevin Qinghong Lin and Mike Zheng Shou},
      year={2025},
      eprint={2503.09402},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.09402}, 
}