-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GemmEpilogueOp with series of CUTLASS kernel #61925
GemmEpilogueOp with series of CUTLASS kernel #61925
Conversation
… my-cool-stuff before kai first pr, update current branch
你的PR提交成功,感谢你对开源项目的贡献! |
|
… my-cool-stuff for second pr
… my-cool-stuff Update mmha kernel
… my-cool-stuff merge upstream for add cutlass fused fc ops
…t of fc ops passed, now this PR is only for fc
… my-cool-stuff The modification of mmha was restored to the state of three months ago, and the single test of fc operator passed Now, this PR is only for fc. some files are not needed, will update soon
} // namespace fusion | ||
} // namespace phi | ||
|
||
PD_REGISTER_KERNEL(fc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个算子不能叫fc吧,理论应该叫fused_***
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
等新的push会更新
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个方便马上改吗?因为kernel这样注册进去,难说会不会在使用fc的场合产生冲突。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前该内核和算子的名字已修改为gemm_epilogue
// 暂时把下两个参数从参数列表移到这里, 以对齐FCKernel | ||
// const std::string& data_format, | ||
// float leaky_alpha, | ||
const std::string data_format("RRR"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RCR的格式能够支持吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前不会支持RCR,可能会先关注整体的性能,后续可能会扩展。(关于是不是先支持多种layout,我和mentor有过商议。)
Sorry to inform you that f959b87's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Sorry to inform you that 519a02b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
… my-cool-stuff 'llm perf test completed, pull for push'
…is remove the two files from this pr
Sorry to inform you that 008e268's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
… my-cool-stuff 'merge develop for push gemm_epilogue'
… my-cool-stuff 'merge develop'
… my-cool-stuff 'merge develop for pr merge'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
经线下沟通,这个算子本身不支持double,对数据类型进行approve
… my-cool-stuff 'merger develop for push'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for only register float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Others
PR Types
Others
Description
P-card-71501
目标是要融合形如 matmul + add + act 的模式。用Cutlass编写GemmEpilogueOp,生成多种内核配置,寻求更优的融合实现。
matmul_add_act_fuse_pass支持 cublasLt(FcOp) 和 cutlass(GemmEpilogueOp) 两种路径,用户通过Exp_EnableUseCutlass() API修改analysis_config,来选择是否启用cutlass实现的Op(GemmEpilogueOp):在create_predictor的时候会读取analysis_config,给matmul_add_act_fuse_pass设置use_cutlass属性,并将该pass加入passManager。在Run该passManager的时候,matmul_add_act_fuse_pass对象的InitializePatterns方法被调用,pass对象根据get到的use_cutlass属性值,选择生成GemmEpilogueOp对应的模式或FcOp对应的模式,从而达成双路径的选择。
新Op(GemmEpilogueOp)在elementwiseAdd的时候,bias支持两种规模[1,N] 和 [M, N]([M,N]是matmul的输出规模)。
新Op支持 paddle.add(paddle.matmul(x, w), y) 和 paddle.add(y, paddle.matmul(x, w))两种模式(add参数位置调换)。
新Op支持Relu和Gelu激活。
新Op和原来的FcOp共用FCInferMeta函数,我放宽了该函数的约束以匹配额外模式。也就是说FcOp不能处理的模式,目前只在pass的约束中过滤,在FCInferMeta中的check被取消了。
关于性能:
GemmEpilogueOp与散op相比,在大模型上跑2batch的端到端测速:
在llama上有大概 2.0% 的提速
在chatglm2上有大概 8.5% 的提速
TODO:
pass目前提供Relu和Gelu激活的融合,还有三种激活目前已在kernel里实现(处于注释状态),但尚未在pass里支持。在kernel层面,解注释即可使用。