🔭 I’m currently working on video understanding and large multimodal models
📫 How to reach me: shehan.munasinghe@mbzuai.ac.ae
🔭 I’m currently working on video understanding and large multimodal models
📫 How to reach me: shehan.munasinghe@mbzuai.ac.ae
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.