Skip to content
View FabianSchuetze's full-sized avatar

Block or report FabianSchuetze

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A C++ compile-time math library using generalized constant expressions

C++ 767 65 Updated Jun 22, 2024

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 663 119 Updated Feb 21, 2025

Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 495 34 Updated Feb 27, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,056 520 Updated Mar 16, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,910 220 Updated Mar 4, 2025

Demonstration of various hardware effects.

C++ 2,891 161 Updated Feb 29, 2024

Memory-Efficient CUDA kernels for training ConvNets with PyTorch.

Python 38 1 Updated Feb 25, 2025

Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib

C 57 11 Updated Apr 10, 2023

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,156 78 Updated Mar 21, 2025

Cross-platform c++ sdk & model hub for easy ai inference

C++ 4 1 Updated Mar 5, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,059 673 Updated Mar 22, 2025

BLISlab: A Sandbox for Optimizing GEMM

C 507 108 Updated Jun 17, 2021

Official inference framework for 1-bit LLMs

C++ 12,820 906 Updated Feb 18, 2025

Offical code of "QKFormer: Hierarchical Spiking Transformer using Q-K Attention" (NeurIPS 2024,Spotlight 3%)

Python 101 4 Updated Jan 2, 2025

An easy-to-use and fast library for task-based parallelism, utilizing coroutines.

C++ 322 6 Updated Sep 13, 2024

Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core …

HTML 653 65 Updated Sep 1, 2024

[CVPR 2024] Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

C++ 35 1 Updated Sep 7, 2024

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,007 612 Updated Aug 18, 2024

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)

C 1,059 344 Updated Mar 21, 2025

A CPU tool for benchmarking the peak of floating points

Assembly 528 130 Updated Oct 4, 2024

Utility to explode a tflite pipeline into individual ops for testing.

C++ 2 Updated Apr 20, 2021

a language for fast, portable data-parallel computation

C++ 6,009 1,078 Updated Mar 22, 2025

Library for specialized dense and sparse matrix operations, and deep learning primitives.

C 867 191 Updated Mar 22, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 308 45 Updated Feb 11, 2025

Tensor Core Multiplication at the Speed of CuBLAS in Three Simple Steps

Cuda 2 1 Updated Mar 17, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,258 558 Updated Oct 19, 2024

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

C++ 2,937 790 Updated Mar 21, 2025

GPU programming related news and material links

1,421 84 Updated Jan 6, 2025

VMamba: Visual State Space Models,code is based on mamba

Python 2,475 168 Updated Mar 7, 2025
Next
Showing results