GemmEpilogueOp with series of CUTLASS kernel #61925

YKTian-x2b · 2024-02-21T08:43:58Z

PR Category

Others

PR Types

Others

Description

P-card-71501

目标是要融合形如 matmul + add + act 的模式。用Cutlass编写GemmEpilogueOp，生成多种内核配置，寻求更优的融合实现。

matmul_add_act_fuse_pass支持 cublasLt(FcOp) 和 cutlass(GemmEpilogueOp) 两种路径，用户通过Exp_EnableUseCutlass() API修改analysis_config，来选择是否启用cutlass实现的Op（GemmEpilogueOp）：在create_predictor的时候会读取analysis_config，给matmul_add_act_fuse_pass设置use_cutlass属性，并将该pass加入passManager。在Run该passManager的时候，matmul_add_act_fuse_pass对象的InitializePatterns方法被调用，pass对象根据get到的use_cutlass属性值，选择生成GemmEpilogueOp对应的模式或FcOp对应的模式，从而达成双路径的选择。

新Op(GemmEpilogueOp)在elementwiseAdd的时候，bias支持两种规模[1,N] 和 [M, N]（[M,N]是matmul的输出规模）。
新Op支持 paddle.add(paddle.matmul(x, w), y) 和 paddle.add(y, paddle.matmul(x, w))两种模式（add参数位置调换）。
新Op支持Relu和Gelu激活。

新Op和原来的FcOp共用FCInferMeta函数，我放宽了该函数的约束以匹配额外模式。也就是说FcOp不能处理的模式，目前只在pass的约束中过滤，在FCInferMeta中的check被取消了。

关于性能：
GemmEpilogueOp与散op相比，在大模型上跑2batch的端到端测速：
在llama上有大概 2.0% 的提速
在chatglm2上有大概 8.5% 的提速

TODO：
pass目前提供Relu和Gelu激活的融合，还有三种激活目前已在kernel里实现（处于注释状态），但尚未在pass里支持。在kernel层面，解注释即可使用。

… my-cool-stuff before kai first pr, update current branch

paddle-bot · 2024-02-21T08:44:03Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2024-02-21T08:44:04Z

All committers have signed the CLA.

CLAassistant · 2024-02-21T08:44:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

… my-cool-stuff for second pr

… my-cool-stuff Update mmha kernel

… my-cool-stuff merge upstream for add cutlass fused fc ops

…t of fc ops passed, now this PR is only for fc

… my-cool-stuff The modification of mmha was restored to the state of three months ago, and the single test of fc operator passed Now, this PR is only for fc. some files are not needed, will update soon

vivienfanghuagood · 2024-03-26T09:33:18Z

paddle/phi/kernels/fusion/cutlass/fused_fc_add_act_kernel.cu

+}  // namespace fusion
+}  // namespace phi
+
+PD_REGISTER_KERNEL(fc,


这个算子不能叫fc吧，理论应该叫fused_***

等新的push会更新

这个方便马上改吗？因为kernel这样注册进去，难说会不会在使用fc的场合产生冲突。

目前该内核和算子的名字已修改为gemm_epilogue

vivienfanghuagood · 2024-03-26T09:38:29Z

paddle/phi/kernels/fusion/cutlass/fused_fc_add_act_kernel.cu

+  // 暂时把下两个参数从参数列表移到这里, 以对齐FCKernel
+  // const std::string& data_format,
+  // float leaky_alpha,
+  const std::string data_format("RRR");


RCR的格式能够支持吗？

目前不会支持RCR，可能会先关注整体的性能，后续可能会扩展。（关于是不是先支持多种layout，我和mentor有过商议。）

paddle-ci-bot · 2024-03-27T03:08:16Z

Sorry to inform you that f959b87's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2024-04-06T03:16:35Z

Sorry to inform you that 519a02b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… my-cool-stuff 'llm perf test completed, pull for push'

…is remove the two files from this pr

paddle-ci-bot · 2024-05-02T03:18:28Z

Sorry to inform you that 008e268's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… my-cool-stuff 'merge develop for push gemm_epilogue'

… my-cool-stuff 'merge develop'

… my-cool-stuff 'merge develop for pr merge'

zhhsplendid

经线下沟通，这个算子本身不支持double，对数据类型进行approve

… my-cool-stuff 'merger develop for push'

… my-cool-stuff

Aurelius84

LGTM for only register float

XiaoguangHu01

LGTM

…lePaddle#61925)

YKTian-x2b added 2 commits February 21, 2024 08:32

split seq_len to improve mmha perf

24470f3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e428611

… my-cool-stuff before kai first pr, update current branch

YKTian-x2b added 9 commits March 1, 2024 03:03

update postProcessKernel and some HyperParam

c3c8e1d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

536831c

… my-cool-stuff for second pr

Update mmha Kernel

f63e995

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f93b21c

… my-cool-stuff Update mmha kernel

add cutlass fused fc ops

e1b65fe

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cf5264f

… my-cool-stuff merge upstream for add cutlass fused fc ops

mod of mmha was restored to the state of three months ago, and unites…

6f5cd32

…t of fc ops passed, now this PR is only for fc

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5f8e222

… my-cool-stuff The modification of mmha was restored to the state of three months ago, and the single test of fc operator passed Now, this PR is only for fc. some files are not needed, will update soon

update fc ops

f959b87

YKTian-x2b changed the title ~~split seq_len to improve mmha perf~~ series of fuse_gemm Mar 19, 2024

vivienfanghuagood reviewed Mar 26, 2024

View reviewed changes

yuanlehome and others added 7 commits March 27, 2024 14:18

refine some code

3f24f7f

update

a8deb08

update

7ff49ba

fix drr rewrite

e3c1981

update fc_fuse_pass with 2D_elementwiseAdd, will fix a bug with new pr

e071f57

merge yuanlehome

8fd8d3e

merge upstream/develop

519a02b

YKTian-x2b added 5 commits April 8, 2024 10:17

llm perf test completed

d04df38

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5ddee1f

… my-cool-stuff 'llm perf test completed, pull for push'

a little bit of mod

6d2f530

rm non-related files

f486821

try to recover mmha files that were deleted by mistake, my intention …

c48a0a6

…is remove the two files from this pr

with cutlass download

008e268

YKTian-x2b added 4 commits May 6, 2024 06:42

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b31b878

… my-cool-stuff 'merge develop for push gemm_epilogue'

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9bc0e06

… my-cool-stuff 'merge develop'

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2efeb6f

… my-cool-stuff 'merge develop for pr merge'

use PADDLE_ENFORCE && unitest conflict fix

5531403

tianshuo78520a approved these changes May 9, 2024

View reviewed changes

XieYunshen previously approved these changes May 9, 2024

View reviewed changes

zhhsplendid previously approved these changes May 9, 2024

View reviewed changes

for unitest timeout

7cc423e

YKTian-x2b dismissed stale reviews from zhhsplendid and XieYunshen via 7cc423e May 9, 2024 06:58

YKTian-x2b requested review from wanghuancoder, luotao1, Aurelius84, XiaoguangHu01 and qili93 as code owners May 9, 2024 06:58

XieYunshen previously approved these changes May 10, 2024

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cc6a938

… my-cool-stuff 'merger develop for push'

YKTian-x2b dismissed XieYunshen’s stale review via 275988d May 10, 2024 04:53

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2b657ad

… my-cool-stuff

YKTian-x2b force-pushed the my-cool-stuff branch from 275988d to 2b657ad Compare May 10, 2024 05:21

XieYunshen approved these changes May 10, 2024

View reviewed changes

yuanlehome approved these changes May 11, 2024

View reviewed changes

tianshuo78520a approved these changes May 11, 2024

View reviewed changes

Aurelius84 approved these changes May 11, 2024

View reviewed changes

zhhsplendid approved these changes May 11, 2024

View reviewed changes

XiaoguangHu01 approved these changes May 11, 2024

View reviewed changes

zhoutianzi666 merged commit 9a825f2 into PaddlePaddle:develop May 11, 2024
30 of 31 checks passed

co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 13, 2024

[Paddle Inference] GemmEpilogueOp with series of CUTLASS kernel (Padd…

cba6824

…lePaddle#61925)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GemmEpilogueOp with series of CUTLASS kernel #61925

GemmEpilogueOp with series of CUTLASS kernel #61925

YKTian-x2b commented Feb 21, 2024 •

edited

Loading

paddle-bot bot commented Feb 21, 2024

CLAassistant commented Feb 21, 2024 •

edited

Loading

CLAassistant commented Feb 21, 2024

vivienfanghuagood Mar 26, 2024

YKTian-x2b Mar 27, 2024

zhhsplendid May 9, 2024

YKTian-x2b May 9, 2024

vivienfanghuagood Mar 26, 2024

YKTian-x2b Mar 27, 2024

paddle-ci-bot bot commented Mar 27, 2024

paddle-ci-bot bot commented Apr 6, 2024

paddle-ci-bot bot commented May 2, 2024

zhhsplendid left a comment

Aurelius84 left a comment

XiaoguangHu01 left a comment

GemmEpilogueOp with series of CUTLASS kernel #61925

GemmEpilogueOp with series of CUTLASS kernel #61925

Conversation

YKTian-x2b commented Feb 21, 2024 • edited Loading

PR Category

PR Types

Description

paddle-bot bot commented Feb 21, 2024

CLAassistant commented Feb 21, 2024 • edited Loading

CLAassistant commented Feb 21, 2024

vivienfanghuagood Mar 26, 2024

Choose a reason for hiding this comment

YKTian-x2b Mar 27, 2024

Choose a reason for hiding this comment

zhhsplendid May 9, 2024

Choose a reason for hiding this comment

YKTian-x2b May 9, 2024

Choose a reason for hiding this comment

vivienfanghuagood Mar 26, 2024

Choose a reason for hiding this comment

YKTian-x2b Mar 27, 2024

Choose a reason for hiding this comment

paddle-ci-bot bot commented Mar 27, 2024

paddle-ci-bot bot commented Apr 6, 2024

paddle-ci-bot bot commented May 2, 2024

zhhsplendid left a comment

Choose a reason for hiding this comment

Aurelius84 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

YKTian-x2b commented Feb 21, 2024 •

edited

Loading

CLAassistant commented Feb 21, 2024 •

edited

Loading