Skip to content

Latest commit

 

History

History
39 lines (28 loc) · 869 Bytes

README.md

File metadata and controls

39 lines (28 loc) · 869 Bytes

graphy

CUDA

Binary analysis on CUDA

cuobjdump -ptx <file> | cu++filt

TODO: perform a small binary analysis section on the kernels :D

TODO: cudaGraphDebugDotPrint()

use __noinline__ to perform binary analisis on __device__ functions

Binary analysis on DS::join

See dump header for information about compilation

$L__BB3_4:
max.u32 %r19, %r22, %r21;
min.u32 %r22, %r22, %r21;
mul.wide.u32 %rd20, %r19, 4;
add.s64 %rd19, %rd6, %rd20;
//
fence.sc.gpu;
//
//
atom.cas.acquire.gpu.b32 %r21,[%rd19],%r19,%r22;
//
setp.ne.s32 %p4, %r19, %r21;
@%p4 bra $L__BB3_4;

References

  • Fallin, A., Gonzalez, A., Seo, J., & Burtscher, M. (2023, November). A High-Performance MST Implementation for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-13).