I'm a Ph.D. student at Show Lab, National University of Singapore.
I work in Vision+Language, Video Understanding, and AI Agents.
🌐 Homepage: qhlin.me
📧 Email: kevin.qh.lin@gmail.com
I'm a Ph.D. student at Show Lab, National University of Singapore.
I work in Vision+Language, Video Understanding, and AI Agents.
🌐 Homepage: qhlin.me
📧 Email: kevin.qh.lin@gmail.com
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[NeurIPS 2022] Egocentric Video-Language Pretraining
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding