Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert sharding grouping logic for vbe #2216

Closed
wants to merge 2 commits into from

Conversation

joshuadeng
Copy link
Contributor

Summary:
reverting sharding grouping logic in EBC/VLE modules that supported specific UVM caching + prefetch pipeline uses cases to circumvent VBE TBE output concatenation.

As concatenation is implemented in the preceding diff, this diff cleans up the logic left behind from grouping sharding by UVM caching kernel conditions to avoid VBE TBE output concatenation.

Differential Revision: D58989195

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58989195

joshuadeng and others added 2 commits July 10, 2024 13:31
Summary:
Pull Request resolved: pytorch#2215

Previously to handle multiple VBE TBE output which is 1d tensor ordered by rank, we grouped sharding info such that there would only be one TBE created per sharding module. This avoided the issue of concatting multiple 1d tensors that are ordered by rank (not a problem in on VBE bc of 2d output which we can concat on dim 1).

This grouping which would be done only applies to specific UVM caching setups that used prefetch pipeline, as each sharding type could require multiple TBE to handle both HBM and UVM caching setups. In most cases the TBE could be fused for each sharding type, so we grouped by such.

Each sharding module handles individual input dist, lookup, output dist, and by creating a sharding module per each TBE in EMO setups would cause regression, as there would be an increase in comms to handle the increased input dists and output dists.

This diff removes the need for the grouping logic to circumvent the VBE TBE output concatenation by implementing output concatenation, which removes the necessity for specialized sharding grouping logic for specific EMO cases.

Differential Revision: D58894728

Reviewed By: dstaay-fb, levythu
Summary:
Pull Request resolved: pytorch#2216

reverting sharding grouping logic in EBC/VLE modules that supported specific UVM caching + prefetch pipeline uses cases to circumvent VBE TBE output concatenation.

As concatenation is implemented in the preceding diff, this diff cleans up the logic left behind from grouping sharding by UVM caching kernel conditions to avoid VBE TBE output concatenation.

Reviewed By: dstaay-fb, levythu

Differential Revision: D58989195
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58989195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants