FabianSchuetze

Fabian Schuetze FabianSchuetze

19 followers · 76 following

Achievements

Stars

kthohr / gcem

A C++ compile-time math library using generalized constant expressions

C++ 767 65 Updated Jun 22, 2024

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 663 119 Updated Feb 21, 2025

ashvardanian / less_slow.cpp

Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 495 34 Updated Feb 27, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,056 520 Updated Mar 16, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,910 220 Updated Mar 4, 2025

Kobzol / hardware-effects

Demonstration of various hardware effects.

C++ 2,891 161 Updated Feb 29, 2024

andravin / spio

Memory-Efficient CUDA kernels for training ConvNets with PyTorch.

Python 38 1 Updated Feb 25, 2025

quic / toolchain_for_hexagon

Shell 27 12 Updated Mar 11, 2025

XiaoMi / nnlib

Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib

C 57 11 Updated Apr 10, 2023

zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,156 78 Updated Mar 21, 2025

blace-ai / blace-ai

Cross-platform c++ sdk & model hub for easy ai inference

C++ 4 1 Updated Mar 5, 2025

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,059 673 Updated Mar 22, 2025

flame / blislab

BLISlab: A Sandbox for Optimizing GEMM

C 507 108 Updated Jun 17, 2021

microsoft / BitNet

Official inference framework for 1-bit LLMs

C++ 12,820 906 Updated Feb 18, 2025

zhouchenlin2096 / QKFormer

Offical code of "QKFormer: Hierarchical Spiking Transformer using Q-K Attention" (NeurIPS 2024，Spotlight 3%)

Python 101 4 Updated Jan 2, 2025

mtmucha / coros

An easy-to-use and fast library for task-based parallelism, utilizing coroutines.

C++ 322 6 Updated Sep 13, 2024

dicksites / KUtrace

Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core …

HTML 653 65 Updated Sep 1, 2024