Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle Inference]cutlass kernel compile optimization #64641

Merged

Conversation

YKTian-x2b
Copy link
Contributor

@YKTian-x2b YKTian-x2b commented May 27, 2024

PR Category

Inference

PR Types

Others

Description

CMakeLists添加自定义target,让make阶段可以执行fused_conv2d和gemm_epilogue的编译脚本,分别生成对应so。

不在CMakeLists里直接编译的原因:内核用的cutlass可能和paddle子模块的版本不一样,伴随着可能需要C++17。所以,选择执行对应脚本。

P-card-71501

Copy link

paddle-bot bot commented May 27, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -29,5 +29,5 @@ gpu_cc="80"

cd $build_directory
cmake .. -DPYTHON_EXECUTABLE=$python_exe_path -DCUDA_TOOLKIT_ROOT_DIR=$cuda_root_path -DCOMPUTE_CAPABILITY=$gpu_cc
make -j
make -j10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的线程数待定,可能是$(nproc)/4 之类的。如果线程少了,会出现paddle都编完了,cutlass算子动态库还在编的尴尬情况。

@zhoutianzi666 zhoutianzi666 changed the title cutlass kernel compile optimization [Paddle Inference]cutlass kernel compile optimization May 29, 2024

python_exe_path="${1:-$default_python_exe_path}"
cuda_root_path="${2:-$default_cuda_root_path}"
gpu_cc="${3:-$default_gpu_cc}"

cd $build_directory
cmake .. -DPYTHON_EXECUTABLE=$python_exe_path -DCUDA_TOOLKIT_ROOT_DIR=$cuda_root_path -DCOMPUTE_CAPABILITY=$gpu_cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否应该用paddle里默认的cmake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get!马上改。

… cutlass_compile_optimization

'merge develop for push pr'
yuanlehome
yuanlehome previously approved these changes May 30, 2024
Copy link
Contributor

@zhoutianzi666 zhoutianzi666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSTM

@zhoutianzi666 zhoutianzi666 merged commit c2836d0 into PaddlePaddle:develop May 31, 2024
32 checks passed
co63oc pushed a commit to co63oc/Paddle that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants