graphy

CUDA

Binary analysis on CUDA

cuobjdump -ptx <file> | cu++filt

TODO: perform a small binary analysis section on the kernels :D

TODO: cudaGraphDebugDotPrint()

use __noinline__ to perform binary analisis on __device__ functions

Binary analysis on `DS::join`

See dump header for information about compilation

$L__BB3_4:
max.u32 %r19, %r22, %r21;
min.u32 %r22, %r22, %r21;
mul.wide.u32 %rd20, %r19, 4;
add.s64 %rd19, %rd6, %rd20;
//
fence.sc.gpu;
//
//
atom.cas.acquire.gpu.b32 %r21,[%rd19],%r19,%r22;
//
setp.ne.s32 %p4, %r19, %r21;
@%p4 bra $L__BB3_4;

References

Fallin, A., Gonzalez, A., Seo, J., & Burtscher, M. (2023, November). A High-Performance MST Implementation for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-13).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

graphy

CUDA

Binary analysis on `DS::join`

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

graphy

CUDA

Binary analysis on DS::join

References

Binary analysis on `DS::join`