triton-inference-server

Star

Here are 11 public repositories matching this topic...

isarsoft / yolov4-triton-tensorrt

Star

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

docker deep-learning object-detection tensorrt yolov4 triton-inference-server yolov4-tiny

Updated Jun 2, 2022
C++

torchpipe / torchpipe

Star

Serving Inside Pytorch

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server llm-serving

Updated Mar 28, 2025
C++

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

tensorflow torch tensorrt serving triton-inference-server dynamic-batching vllm tensorrt-llm

Updated Mar 25, 2025
C++

triton-inference-server / onnxruntime_backend

Star

The Triton backend for the ONNX Runtime.

backend inference triton-inference-server onnx-runtime

Updated Mar 14, 2025
C++

NVIDIA-ISAAC-ROS / isaac_ros_dnn_inference

Star

NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

ai deep-learning gpu dnn ros nvidia triton deeplearning tao jetson ros2 tensorrt triton-inference-server tensorrt-inference ros2-humble

Updated Feb 28, 2025
C++

olibartfast / computer-vision-triton-cpp-client

Star

C++ application to perform computer vision tasks using Nvidia Triton Server for model inference

computer-vision object-detection triton-inference-server

Updated Mar 2, 2025
C++

smarter-project / armnn_tflite_backend

Star

TensorFlow Lite backend with ArmNN delegate support for Nvidia Triton

deep-learning gpu inference triton triton-inference-server

Updated Sep 3, 2024
C++

viam-modules / viam-mlmodelservice-triton

Star

MLModelService wrapping Nvidia's Triton Server

Updated Mar 5, 2025
C++

jagennath-hari / Edge-Optimized-Tracking-System

Star

A high-performance multi-object tracking system utilizing a quantized YOLOv11 model deployed on the Triton Inference Server, integrated with a CUDA-accelerated particle filter for robust tracking mutiple objects.