ORYX

All

26 repositories

LLMVoX
Public
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
audio text-to-speech streaming transformers tts codec omni voice-assistant neural-speech-synthesis mbzuai
Python
•8•125•1•0•Updated Mar 13, 2025Mar 13, 2025
GeoPixel
Public
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
remote-sensing segmentation-models foundation-models large-vision-language-models large-multimodal-models vision-language-models grounding-llms
Python
•
Apache License 2.0
•1•64•2•0•Updated Mar 12, 2025Mar 12, 2025
Awesome-LLM-Post-training
Public
Awesome Reasoning LLM Tutorial/Survey/Guide
reinforcement-learning scaling reasoning fine post-training large-language-models
Python
•60•985•0•0•Updated Mar 11, 2025Mar 11, 2025
CoVR-VidLLM-CVPRW25
Public
Composed Video Retrieval Challenge CVPR Workshop 2025
Python
•1•2•0•0•Updated Mar 9, 2025Mar 9, 2025
AIN
Public
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
ocr culture remote-sensing vqa vlm vision-and-language lmm multi-images
HTML
•
MIT License
•0•32•0•0•Updated Mar 4, 2025Mar 4, 2025
VideoGLaMM
Public
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
vision-and-language lmm foundation-models vision-language-model llm-agent cvpr2025
Python
•1•47•3•0•Updated Mar 3, 2025Mar 3, 2025
KITAB-Bench
Public
A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
benchmark ocr vqa pdf-to-text arabic table-detection layout-detection vlms
Python
•
MIT License
•0•29•0•0•Updated Mar 2, 2025Mar 2, 2025
ALM-Bench
Public
[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
multilingual benchmarking multi-modal cultural gpt-4 multimodal-large-language-models
Python
•
Other
•2•33•0•0•Updated Feb 28, 2025Feb 28, 2025
TimeTravel
Public
Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
benchmark historical cultural lmm
Python
•
MIT License
•0•17•0•0•Updated Feb 24, 2025Feb 24, 2025
PALO
Public
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
Python
•
Apache License 2.0
•5•84•5•0•Updated Feb 17, 2025Feb 17, 2025
LlamaV-o1
Public
Rethinking Step-by-step Visual Reasoning in LLMs
Python
•
Apache License 2.0
•16•270•4•2•Updated Jan 24, 2025Jan 24, 2025
Camel-Bench
Public
[NACCL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
benchmark vqa arabic multimodal-learning visual-question-answering mbzuai large-multimodal-models
Python
•
MIT License
•1•31•0•0•Updated Jan 23, 2025Jan 23, 2025
UniMed-CLIP
Public
Official repository of paper titled "UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities".
Python
•
Other
•7•88•1•0•Updated Dec 26, 2024Dec 26, 2024
BiMediX2
Public
Bio-Medical EXpert LMM with English and Arabic Language Capabilities
6•63•1•0•Updated Dec 15, 2024Dec 15, 2024
GeoChat
Public
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
remote-sensing vlm
Python
•46•534•35•1•Updated Nov 28, 2024Nov 28, 2024
BiMediX
Public
Bilingual Medical Mixture of Experts LLM
Other
•1•31•1•0•Updated Nov 23, 2024Nov 23, 2024
groundingLMM
Public
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
vision-and-language lmm foundation-models vision-language-model llm-agent
Python
•43•845•29•0•Updated Nov 23, 2024Nov 23, 2024
ClimateGPT
Public
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.
Python
•10•79•0•0•Updated Sep 24, 2024Sep 24, 2024
Video-ChatGPT
Public
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat
Python
•
Creative Commons Attribution 4.0 International
•111•1.3k•22•0•Updated Aug 27, 2024Aug 27, 2024
CVRR-Evaluation-Suite
Public
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".
Python
•
Creative Commons Attribution 4.0 International
•4•45•0•0•Updated Aug 23, 2024Aug 23, 2024
VideoGPT-plus
Public
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
chatbot clip image-encoder video-encoder multimodal dual-encoder vision-language vicuna gpt4 vision-language-pretraining
Python
•
Creative Commons Attribution 4.0 International
•16•262•16•1•Updated Aug 11, 2024Aug 11, 2024
XrayGPT
Public
[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
Python
•61•497•19•2•Updated Aug 8, 2024Aug 8, 2024
LLaVA-pp
Public
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
conversation lmms vision-language llm llava llama3 phi3 llava-llama3 llava-phi3 llama3-llava
Python
•61•835•17•2•Updated Jul 10, 2024Jul 10, 2024
MobiLlama
Public
MobiLlama : Small Language Model tailored for edge devices
slm llm efficient-llm mobile-llm tiny-llm
Python
•
Apache License 2.0
•48•626•13•1•Updated Mar 3, 2024Mar 3, 2024
Video-LLaVA
Public
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
video transcription lmm grounding video-grounding llm video-conversation
Python
•12•255•15•0•Updated Jan 2, 2024Jan 2, 2024
Awesome-CV-Foundational-Models
Public
31•8•0•0•Updated Jul 31, 2023Jul 31, 2023