-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pir+auto parallel] add reshard op for input when needed #63072
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -66,6 +66,7 @@ def __init__(self, mesh): | |||
) | |||
|
|||
def forward(self, x): | |||
x.stop_gradient = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need to make x require for gradient, the relu_grad in backward will trigger the partial-->replicated allreduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is needed, otherwise, relu_grad is not executed.
op.operands(), op.dist_attr().operand_dist_attrs() | ||
): | ||
if ( | ||
var.source().is_dist_dense_tensor_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scenario where src_dist_attr and dst_dist_attr have different mesh (e.g. Pipeline Parallelism), it would be better to insert two reshard ops.
one reshard op's mesh = src_dist_attr's mesh
the other's mesh = dst_dist_attr's mesh
therefore in the following (pipeline stage) pruning pass, different stage will keep the reshard op by the mesh it need and remove the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be refined in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for spmd rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
[pir+auto parallel] add reshard op for input when needed
This PR adds a pass named
apply_partition_pass
, which will add reshard op for input when the value's dist_attr is not equal to the use_op's operand dist_attrPcard-76459
The program before,
The program after,