Pinned Loading
Repositories
Showing 10 of 11 repositories
- flash-linear-attention Public
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
- distillation-fla Public Forked from OpenSparseLLMs/Linearization
Distillation pipeline from pretrained Transformers to customized FLA models
- vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
- fla-rl Public
A minimal RL frame work for scaling FLA models on long-horizon reasoning and agentic scenarios.
- native-sparse-attention Public
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
-