This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
-
Updated
Jun 2, 2022 - C++
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
Serving Inside Pytorch
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
The Triton backend for the ONNX Runtime.
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
C++ application to perform computer vision tasks using Nvidia Triton Server for model inference
TensorFlow Lite backend with ArmNN delegate support for Nvidia Triton
MLModelService wrapping Nvidia's Triton Server
A high-performance multi-object tracking system utilizing a quantized YOLOv11 model deployed on the Triton Inference Server, integrated with a CUDA-accelerated particle filter for robust tracking mutiple objects.
Web Services for Machine Learning in C++
Cassandra plugin for NVIDIA DALI
Add a description, image, and links to the triton-inference-server topic page so that developers can more easily learn about it.
To associate your repository with the triton-inference-server topic, visit your repo's landing page and select "manage topics."