-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paddle Inference]cutlass kernel compile optimization #64641
[Paddle Inference]cutlass kernel compile optimization #64641
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -29,5 +29,5 @@ gpu_cc="80" | |||
|
|||
cd $build_directory | |||
cmake .. -DPYTHON_EXECUTABLE=$python_exe_path -DCUDA_TOOLKIT_ROOT_DIR=$cuda_root_path -DCOMPUTE_CAPABILITY=$gpu_cc | |||
make -j | |||
make -j10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的线程数待定,可能是$(nproc)/4 之类的。如果线程少了,会出现paddle都编完了,cutlass算子动态库还在编的尴尬情况。
|
||
python_exe_path="${1:-$default_python_exe_path}" | ||
cuda_root_path="${2:-$default_cuda_root_path}" | ||
gpu_cc="${3:-$default_gpu_cc}" | ||
|
||
cd $build_directory | ||
cmake .. -DPYTHON_EXECUTABLE=$python_exe_path -DCUDA_TOOLKIT_ROOT_DIR=$cuda_root_path -DCOMPUTE_CAPABILITY=$gpu_cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否应该用paddle里默认的cmake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get!马上改。
… cutlass_compile_optimization 'merge develop for push pr'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSTM
PR Category
Inference
PR Types
Others
Description
CMakeLists添加自定义target,让make阶段可以执行fused_conv2d和gemm_epilogue的编译脚本,分别生成对应so。
不在CMakeLists里直接编译的原因:内核用的cutlass可能和paddle子模块的版本不一样,伴随着可能需要C++17。所以,选择执行对应脚本。
P-card-71501