Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GemmEpilogueOp with series of CUTLASS kernel #61925

Merged
merged 63 commits into from
May 11, 2024

Conversation

YKTian-x2b
Copy link
Contributor

@YKTian-x2b YKTian-x2b commented Feb 21, 2024

PR Category

Others

PR Types

Others

Description

P-card-71501

目标是要融合形如 matmul + add + act 的模式。用Cutlass编写GemmEpilogueOp,生成多种内核配置,寻求更优的融合实现。

matmul_add_act_fuse_pass支持 cublasLt(FcOp) 和 cutlass(GemmEpilogueOp) 两种路径,用户通过Exp_EnableUseCutlass() API修改analysis_config,来选择是否启用cutlass实现的Op(GemmEpilogueOp):在create_predictor的时候会读取analysis_config,给matmul_add_act_fuse_pass设置use_cutlass属性,并将该pass加入passManager。在Run该passManager的时候,matmul_add_act_fuse_pass对象的InitializePatterns方法被调用,pass对象根据get到的use_cutlass属性值,选择生成GemmEpilogueOp对应的模式或FcOp对应的模式,从而达成双路径的选择。

新Op(GemmEpilogueOp)在elementwiseAdd的时候,bias支持两种规模[1,N] 和 [M, N]([M,N]是matmul的输出规模)。
新Op支持 paddle.add(paddle.matmul(x, w), y) 和 paddle.add(y, paddle.matmul(x, w))两种模式(add参数位置调换)。
新Op支持Relu和Gelu激活。

新Op和原来的FcOp共用FCInferMeta函数,我放宽了该函数的约束以匹配额外模式。也就是说FcOp不能处理的模式,目前只在pass的约束中过滤,在FCInferMeta中的check被取消了。

关于性能:
GemmEpilogueOp与散op相比,在大模型上跑2batch的端到端测速:
在llama上有大概 2.0% 的提速
在chatglm2上有大概 8.5% 的提速

TODO:
pass目前提供Relu和Gelu激活的融合,还有三种激活目前已在kernel里实现(处于注释状态),但尚未在pass里支持。在kernel层面,解注释即可使用。

Copy link

paddle-bot bot commented Feb 21, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Feb 21, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

… my-cool-stuff

merge upstream for add cutlass fused fc ops
…t of fc ops passed, now this PR is only for fc
… my-cool-stuff

The modification of mmha was restored to the state of three months ago, and the single test of fc operator passed
Now, this PR is only for fc.
some files are not needed, will update soon
@YKTian-x2b YKTian-x2b changed the title split seq_len to improve mmha perf series of fuse_gemm Mar 19, 2024
} // namespace fusion
} // namespace phi

PD_REGISTER_KERNEL(fc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个算子不能叫fc吧,理论应该叫fused_***

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

等新的push会更新

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方便马上改吗?因为kernel这样注册进去,难说会不会在使用fc的场合产生冲突。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前该内核和算子的名字已修改为gemm_epilogue

// 暂时把下两个参数从参数列表移到这里, 以对齐FCKernel
// const std::string& data_format,
// float leaky_alpha,
const std::string data_format("RRR");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RCR的格式能够支持吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前不会支持RCR,可能会先关注整体的性能,后续可能会扩展。(关于是不是先支持多种layout,我和mentor有过商议。)

Copy link

paddle-ci-bot bot commented Mar 27, 2024

Sorry to inform you that f959b87's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link

paddle-ci-bot bot commented Apr 6, 2024

Sorry to inform you that 519a02b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link

paddle-ci-bot bot commented May 2, 2024

Sorry to inform you that 008e268's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

XieYunshen
XieYunshen previously approved these changes May 9, 2024
zhhsplendid
zhhsplendid previously approved these changes May 9, 2024
Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

经线下沟通,这个算子本身不支持double,对数据类型进行approve

XieYunshen
XieYunshen previously approved these changes May 10, 2024
Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for only register float

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhoutianzi666 zhoutianzi666 merged commit 9a825f2 into PaddlePaddle:develop May 11, 2024
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.