Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Feb 6, 2025 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[ICLR 2025] Agent S: an open agentic framework that uses computers like a human
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."