llama
Here are 43 public repositories matching this topic...
High-speed Large Language Model Serving for Local Deployment
-
Updated
Feb 19, 2025 - C++
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
-
Updated
Apr 10, 2025 - C++
A highly optimized LLM inference acceleration engine for Llama and its variants.
-
Updated
Apr 14, 2025 - C++
Fast Multimodal LLM on Mobile Devices
-
Updated
Mar 21, 2025 - C++
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
-
Updated
Apr 14, 2025 - C++
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
-
Updated
Jan 21, 2025 - C++
A high-performance inference system for large language models, designed for production environments.
-
Updated
Apr 12, 2025 - C++
-
Updated
Apr 10, 2025 - C++
CPU inference for the DeepSeek family of large language models in pure C++
-
Updated
Apr 14, 2025 - C++
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
-
Updated
Jan 15, 2025 - C++
LLaVA server (llama.cpp).
-
Updated
Oct 20, 2023 - C++
UnrealMCP is here!! Automatic blueprint and scene generation from AI!! An Unreal Engine plugin for LLM/GenAI models & MCP UE5 server. Supports Claude Desktop App, Windsurf & Cursor, also includes OpenAI's GPT4o, DeepseekR1, Claude Sonnet 3.7 APIs and Grok 3, with plans to add Gemini, audio & realtime APIs soon.
-
Updated
Apr 12, 2025 - C++
Improve this page
Add a description, image, and links to the llama topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama topic, visit your repo's landing page and select "manage topics."