Inferless
Popular repositories Loading
-
triton-co-pilot
triton-co-pilot PublicGenerate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
-
qwq-32b-preview
qwq-32b-preview Public templateA 32B experimental reasoning model for advanced text generation and robust instruction following. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
-
whisper-large-v3
whisper-large-v3 PublicState‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
-
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-qwen-32b Public templateA distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
Repositories
- Phi-3.5-MoE-instruct-8bit Public
Phi-3.5-MoE a compact yet powerful model designed for instruction-following tasks. This model is part of the Phi-3 family, known for its efficiency and high performance. The Phi-3 Mini-128K-Instruct exhibited robust, state-of-the-art performance among models with fewer than 13B parameters.
- idefics-9b-instruct-8bit Public
IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS) is an open-access reproduction of Flamingo, a closed-source visual language model developed by Deepmind. Like GPT-4, the multimodal model accepts arbitrary sequences of image and text inputs and produces text outputs.
- Book-Audio-Summary-Generator Public
- TenyxChat-8x7B-v1 Public
- Command-r-v01 Public
35B model delivering high performance in reasoning, summarization, and question answering. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
- InternVL2-Llama3-76B-AWQ Public
- Stable-Diffusion-3.5-large Public
- realvis-xl_v4.0_lightning Public
A lightweight, accelerated variant of RealVisXL V4.0, engineered for real‑time, high‑quality image generation with enhanced efficiency. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
- tinyllama-1.1b-chat-vllm-gguf Public
Deploy GGUF quantized version of Tinyllama-1.1B GGUF vLLM for efficient inference. <metadata> gpu: A100 | collections: ["Using NFS Volumes", "vLLM"] </metadata>