A curated list for Efficient Large Language Models
-
Updated
Mar 23, 2025 - Python
A curated list for Efficient Large Language Models
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.
The official implementation of the ICML 2023 paper OFQ-ViT
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
A tutorial of model quantization using TensorFlow
PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.
Enterprise multi-agent framework for secure, borderless data collaboration with zero-trust and federated learning-lightweight edge-ready.
This project explores generating high-quality images using depth maps and conditioning techniques like Canny edges, leveraging Stable Diffusion and ControlNet models. It focuses on optimizing image generation with different aspect ratios, inference steps to balance speed and quality.
Unofficial implementation of NCNet using flax and jax
Add a description, image, and links to the model-quantization topic page so that developers can more easily learn about it.
To associate your repository with the model-quantization topic, visit your repo's landing page and select "manage topics."