Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(shuffles): Determination logic for pre shuffle merge #3674

Merged
merged 4 commits into from
Jan 30, 2025

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Jan 13, 2025

Use pre shuffle merge if geometric mean of total number of partitions: sqrt(input * output) is >= 200.

I chose 200 to match shuffle_aggregation_default_partitions, which means shuffles for large aggregations will use pre-shuffle merge.

@github-actions github-actions bot added the feat label Jan 13, 2025
Copy link

codspeed-hq bot commented Jan 13, 2025

CodSpeed Performance Report

Merging #3674 will improve performances by 47.23%

Comparing colin/turn-on-pre-shuffle-merge (062143b) with main (a8d63dd)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_show[100 Small Files] 23.5 ms 16 ms +47.23%

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 72.30769% with 18 lines in your changes missing coverage. Please review.

Project coverage is 76.27%. Comparing base (a8d63dd) to head (062143b).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-physical-plan/src/ops/shuffle_exchange.rs 71.66% 17 Missing ⚠️
src/common/daft-config/src/python.rs 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3674      +/-   ##
==========================================
- Coverage   77.58%   76.27%   -1.31%     
==========================================
  Files         729      728       -1     
  Lines       92035    95211    +3176     
==========================================
+ Hits        71409    72626    +1217     
- Misses      20626    22585    +1959     
Files with missing lines Coverage Δ
daft/context.py 87.65% <ø> (ø)
src/common/daft-config/src/lib.rs 82.27% <100.00%> (ø)
src/common/daft-config/src/python.rs 66.51% <75.00%> (+0.30%) ⬆️
src/daft-physical-plan/src/ops/shuffle_exchange.rs 59.03% <71.66%> (+8.25%) ⬆️

... and 33 files with indirect coverage changes

@colin-ho colin-ho marked this pull request as ready for review January 15, 2025 18:24
@colin-ho colin-ho requested review from samster25 and jaychia and removed request for samster25 January 30, 2025 00:05
@colin-ho colin-ho merged commit 5e980ce into main Jan 30, 2025
43 of 45 checks passed
@colin-ho colin-ho deleted the colin/turn-on-pre-shuffle-merge branch January 30, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants