revert sharding grouping logic for vbe #2216

joshuadeng · 2024-07-09T21:22:22Z

Summary:
reverting sharding grouping logic in EBC/VLE modules that supported specific UVM caching + prefetch pipeline uses cases to circumvent VBE TBE output concatenation.

As concatenation is implemented in the preceding diff, this diff cleans up the logic left behind from grouping sharding by UVM caching kernel conditions to avoid VBE TBE output concatenation.

Differential Revision: D58989195

facebook-github-bot · 2024-07-09T21:22:45Z

This pull request was exported from Phabricator. Differential Revision: D58989195

Summary: Pull Request resolved: pytorch#2215 Previously to handle multiple VBE TBE output which is 1d tensor ordered by rank, we grouped sharding info such that there would only be one TBE created per sharding module. This avoided the issue of concatting multiple 1d tensors that are ordered by rank (not a problem in on VBE bc of 2d output which we can concat on dim 1). This grouping which would be done only applies to specific UVM caching setups that used prefetch pipeline, as each sharding type could require multiple TBE to handle both HBM and UVM caching setups. In most cases the TBE could be fused for each sharding type, so we grouped by such. Each sharding module handles individual input dist, lookup, output dist, and by creating a sharding module per each TBE in EMO setups would cause regression, as there would be an increase in comms to handle the increased input dists and output dists. This diff removes the need for the grouping logic to circumvent the VBE TBE output concatenation by implementing output concatenation, which removes the necessity for specialized sharding grouping logic for specific EMO cases. Differential Revision: D58894728 Reviewed By: dstaay-fb, levythu

Summary: Pull Request resolved: pytorch#2216 reverting sharding grouping logic in EBC/VLE modules that supported specific UVM caching + prefetch pipeline uses cases to circumvent VBE TBE output concatenation. As concatenation is implemented in the preceding diff, this diff cleans up the logic left behind from grouping sharding by UVM caching kernel conditions to avoid VBE TBE output concatenation. Reviewed By: dstaay-fb, levythu Differential Revision: D58989195

facebook-github-bot · 2024-07-10T21:05:27Z

This pull request was exported from Phabricator. Differential Revision: D58989195

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2024

facebook-github-bot added the fb-exported label Jul 9, 2024

joshuadeng and others added 2 commits July 10, 2024 13:31

joshuadeng force-pushed the export-D58989195 branch from a8ba210 to c5144fd Compare July 10, 2024 21:05

facebook-github-bot closed this in e84f086 Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert sharding grouping logic for vbe #2216

revert sharding grouping logic for vbe #2216

joshuadeng commented Jul 9, 2024

facebook-github-bot commented Jul 9, 2024

facebook-github-bot commented Jul 10, 2024

revert sharding grouping logic for vbe #2216

revert sharding grouping logic for vbe #2216

Conversation

joshuadeng commented Jul 9, 2024

facebook-github-bot commented Jul 9, 2024

facebook-github-bot commented Jul 10, 2024