AccelMoE is a project that optimizes a CPU-based mixture-of-experts architecture into GPU-based accelerated code. The project utilizes CUDA kernel programming to effectively execute computations on the GPU. The project was awarded 3rd Place at the Accelerator Programming School competition.
Note
This work is a project conducted as part of the Accelerator Programming School at Seoul National University.
- GPU formatting using CUDA kernel programming
- Kernel fusion to combine Conv1D or Linear and ReLU operations
- CUDA streaming for efficient parallel processing
- Batch processing to maximize throughput
- Warp occupancy optimization
Achieved a 650× speedup when executed on the GPU.
Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 1.467701 (sec)
Throughput: 0.681338 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!
Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 0.074036 (sec)
Throughput: 432.224966 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!
Haeseung Jeon | Suyeon Jo |
@Ewha Womans Univ. | @Myongji Univ. |