cutlass

Here are 4 public repositories matching this topic...

A fast communication-overlapping library for tensor parallelism on GPUs.

gpu cuda pytorch cutlass

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

pytorch implements block sparse

python cuda pytorch matrix-multiplication cutlass blocksparse tilesparse

Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.

To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."