General Matrix Multiplication using NVIDIA Tensor Cores
-
Updated
Jan 25, 2025 - Cuda
NVIDIA Corporation is a company that manufactures graphics processors, mobile technologies, and desktop computers. It is known for developing integrated circuits, which are used in everything from electronic game consoles to personal computers (PCs). The company is a leading manufacturer of high-end graphics processing units (GPUs).
General Matrix Multiplication using NVIDIA Tensor Cores
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
CUDA C simple application for Nvidia's GPU
Measure bandwidth of multiple simultaneously started cudaMemcpyAsync
CUDA script to check NVIDIA GPU device properties & memory available
Modelling parallel processing with GPU
The general Idea of this project is to generate a Fibonacci Sequence and sort it by Manual Sorting Algorithms Such as (Bubble Sort, Quick Sort, Merge Sort, and Heap Sort) And Also Sorting Algorithms by Thrust Library such as (‘thrust::sort’ and ‘thrust::transform’) at the same time.
High-performance computing on clusters using NVIDIA graphics accelerators. GPU Programming on NVIDIA CUDA, profiling and optimization with NVVP.
Distributed MPI based Heterogenous GPU Solver for Markov Decision Processes (MDP)
BitCaine5 the mining engine meets æternity blockchain technology
MicroCoin miner for NVIDIA CUDA compatible GPUs
An implementation of Principal Direction Divisive Partitioning in CUDA. University project for the course "Software & Programming of High Perfomance Systems". Course Code: CEID_NE5407
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
Parallel Heterogeneous CPU/GPU computing
Bechmarck basado en la operacion suma para CPU y GPUs Nvidia con tecnología CUDA.
Code from my Tutorial series on Hive about Nvidia's CUDA API
Calculate minus of 2D arrays on GPU
RMAT Graph Generator for NVIDIA CUDA
Implementation of LeNet-1 Forward Propagation algorithm in CUDA C and profiling of the possible optimisation solutions
Created by Jensen Huang, Curtis Priem, Chris Malachowsky
Released April 5, 1993