Skip to content

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

Notifications You must be signed in to change notification settings

APWS25/AccelMoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AccelMoE: Accelerated Mixture-of-Expert model

AccelMoE is a project that optimizes a CPU-based mixture-of-experts architecture into GPU-based accelerated code. The project utilizes CUDA kernel programming to effectively execute computations on the GPU. The project was awarded 3rd Place at the Accelerator Programming School competition.

Note

This work is a project conducted as part of the Accelerator Programming School at Seoul National University.

Optimization Techniques

  • GPU formatting using CUDA kernel programming
  • Kernel fusion to combine Conv1D or Linear and ReLU operations
  • CUDA streaming for efficient parallel processing
  • Batch processing to maximize throughput
  • Warp occupancy optimization

Improved Performance

Achieved a 650× speedup when executed on the GPU.

CPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 1.467701 (sec)
Throughput: 0.681338 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

GPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 0.074036 (sec)
Throughput: 432.224966 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

Contributors

Haeseung Jeon Suyeon Jo
@Ewha Womans Univ. @Myongji Univ.

About

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published