Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve error message with lazy data frame by explicitly materializing before falling back to dplyr #456

Merged
merged 10 commits into from
Jan 11, 2025

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 11, 2025

Closes #432.

@krlmlr krlmlr changed the title f 432 mat error Squashed commit of the following: Jan 11, 2025
@krlmlr krlmlr enabled auto-merge (squash) January 11, 2025 13:51
@krlmlr krlmlr marked this pull request as draft January 11, 2025 13:52
auto-merge was automatically disabled January 11, 2025 13:52

Pull request was converted to draft

@krlmlr krlmlr changed the title Squashed commit of the following: feat: Improve error message with lazy data frame by explicitly materializing before falling back to dplyr Jan 11, 2025
@krlmlr krlmlr marked this pull request as ready for review January 11, 2025 14:12
@krlmlr krlmlr enabled auto-merge (squash) January 11, 2025 14:13
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if aeb96f3 is merged into main:

  • ✔️001_tpch_01: 23.3ms -> 22.9ms [-5.93%, +2.72%]
  • ✔️001_tpch_02: 67ms -> 66.5ms [-2.01%, +0.39%]
  • ✔️001_tpch_03: 38.4ms -> 38.4ms [-0.66%, +0.99%]
  • ✔️001_tpch_04: 22.1ms -> 21.9ms [-3.05%, +0.92%]
  • ✔️001_tpch_05: 56ms -> 56.3ms [-0.36%, +1.56%]
  • ✔️001_tpch_06: 13.6ms -> 13.9ms [-0.87%, +4.23%]
  • ✔️001_tpch_07: 73.7ms -> 73.5ms [-1.09%, +0.47%]
  • ✔️001_tpch_08: 95.4ms -> 95.5ms [-0.57%, +0.62%]
  • ✔️001_tpch_09: 71.3ms -> 71.6ms [-1.19%, +1.97%]
  • ✔️001_tpch_10: 47.9ms -> 47.6ms [-1.87%, +0.56%]
  • ✔️001_tpch_11: 32.1ms -> 31.9ms [-2.32%, +0.96%]
  • ✔️001_tpch_12: 27ms -> 27ms [-1.79%, +2%]
  • ✔️001_tpch_13: 24.6ms -> 24.8ms [-0.87%, +2.92%]
  • ✔️001_tpch_14: 19.8ms -> 19.8ms [-1.45%, +1.44%]
  • ✔️001_tpch_15: 31.1ms -> 31.3ms [-0.13%, +1.72%]
  • ✔️001_tpch_16: 39.1ms -> 39.4ms [-0.42%, +1.95%]
  • ✔️001_tpch_17: 25.6ms -> 25.8ms [-1.34%, +2.63%]
  • ✔️001_tpch_18: 22.2ms -> 22.1ms [-3.05%, +2.48%]
  • ✔️001_tpch_19: 65.9ms -> 65.7ms [-1.03%, +0.35%]
  • ✔️001_tpch_20: 49.8ms -> 49.6ms [-1.51%, +0.37%]
  • ✔️001_tpch_21: 78.4ms -> 78.5ms [-0.85%, +0.96%]
  • ❗🐌001_tpch_22: 65.2ms -> 66ms [+0.14%, +2.19%]
  • ❗🐌010_tpch_01: 79ms -> 82.4ms [+0.64%, +8%]
  • ✔️010_tpch_02: 71.3ms -> 69.6ms [-6%, +1.2%]
  • ✔️010_tpch_03: 58.1ms -> 58.7ms [-1.17%, +3.3%]
  • ✔️010_tpch_04: 43.6ms -> 42.2ms [-8.5%, +2.06%]
  • ✔️010_tpch_05: 88.7ms -> 88.2ms [-1.93%, +0.84%]
  • ✔️010_tpch_06: 32.3ms -> 31ms [-11.2%, +2.97%]
  • ✔️010_tpch_07: 105ms -> 106ms [-0.41%, +2.07%]
  • ✔️010_tpch_08: 126ms -> 128ms [-1.32%, +3.84%]
  • ✔️010_tpch_09: 115ms -> 114ms [-1.59%, +0.33%]
  • ✔️010_tpch_10: 73.4ms -> 73.4ms [-1.05%, +1.09%]
  • ✔️010_tpch_11: 37.9ms -> 38ms [-4.57%, +5.46%]
  • ✔️010_tpch_12: 59ms -> 57.9ms [-9.95%, +6.25%]
  • ✔️010_tpch_13: 52.9ms -> 52.3ms [-6.64%, +4.14%]
  • ✔️010_tpch_14: 37.9ms -> 37.2ms [-5.68%, +2.28%]
  • ✔️010_tpch_15: 55ms -> 55.9ms [-4.15%, +7.37%]
  • ✔️010_tpch_16: 43ms -> 44.3ms [-1.19%, +6.99%]
  • ✔️010_tpch_17: 54.8ms -> 53ms [-9.91%, +3.42%]
  • ✔️010_tpch_18: 53.2ms -> 52.2ms [-8.79%, +5.11%]
  • ✔️010_tpch_19: 116ms -> 117ms [-2.55%, +4.44%]
  • ✔️010_tpch_20: 66.1ms -> 64.9ms [-6.17%, +2.37%]
  • ✔️010_tpch_21: 239ms -> 242ms [-1.44%, +3.99%]
  • 🚀010_tpch_22: 76.2ms -> 74.9ms [-3.51%, -0.05%]
  • ✔️100_tpch_01: 325ms -> 330ms [-9.28%, +12.38%]
  • ✔️100_tpch_02: 124ms -> 125ms [-1.65%, +4.15%]
  • ✔️100_tpch_03: 181ms -> 185ms [-3.03%, +7.41%]
  • ✔️100_tpch_04: 155ms -> 161ms [-6.33%, +13.89%]
  • ✔️100_tpch_05: 257ms -> 268ms [-8.11%, +15.94%]
  • ✔️100_tpch_06: 112ms -> 106ms [-18.1%, +7.08%]
  • ✔️100_tpch_07: 227ms -> 238ms [-2.54%, +12.72%]
  • ✔️100_tpch_08: 266ms -> 255ms [-11.86%, +4.13%]
  • 🚀100_tpch_09: 350ms -> 317ms [-15.28%, -3.73%]
  • ✔️100_tpch_10: 213ms -> 221ms [-6.47%, +13.23%]
  • ✔️100_tpch_11: 91ms -> 96.4ms [-20.13%, +32.14%]
  • ✔️100_tpch_12: 189ms -> 188ms [-14.12%, +13.84%]
  • ✔️100_tpch_13: 306ms -> 304ms [-4.42%, +2.8%]
  • 🚀100_tpch_14: 126ms -> 116ms [-15.14%, -1.11%]
  • ✔️100_tpch_15: 220ms -> 206ms [-16.28%, +3.39%]
  • ✔️100_tpch_16: 129ms -> 123ms [-10.7%, +1.28%]
  • ✔️100_tpch_17: 186ms -> 175ms [-14.38%, +2.72%]
  • ✔️100_tpch_18: 199ms -> 194ms [-8.86%, +3.52%]
  • ✔️100_tpch_19: 283ms -> 284ms [-4.9%, +5.92%]
  • ✔️100_tpch_20: 181ms -> 173ms [-11.48%, +1.97%]
  • ✔️100_tpch_21: 1.31s -> 1.31s [-6.93%, +6.76%]
  • ✔️100_tpch_22: 173ms -> 175ms [-4.75%, +6.96%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@krlmlr krlmlr merged commit 6d432e1 into main Jan 11, 2025
20 checks passed
@krlmlr krlmlr deleted the f-432-mat-error branch January 11, 2025 18:34
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c48a9d8 is merged into main:

  • ✔️001_tpch_01: 23.4ms -> 23.9ms [-1.6%, +6.3%]
  • ✔️001_tpch_02: 67.2ms -> 67.3ms [-0.44%, +0.97%]
  • ✔️001_tpch_03: 40.1ms -> 40.3ms [-1.07%, +1.73%]
  • ✔️001_tpch_04: 22.2ms -> 22.3ms [-1.22%, +2.24%]
  • ✔️001_tpch_05: 57ms -> 57.3ms [-0.35%, +1.22%]
  • ✔️001_tpch_06: 14.3ms -> 14.1ms [-4.24%, +1.99%]
  • ✔️001_tpch_07: 74ms -> 74.3ms [-0.2%, +0.99%]
  • ✔️001_tpch_08: 97.6ms -> 97.4ms [-1.07%, +0.77%]
  • ✔️001_tpch_09: 72.2ms -> 72.7ms [-0.41%, +1.6%]
  • ✔️001_tpch_10: 48.9ms -> 48ms [-4.66%, +0.8%]
  • ✔️001_tpch_11: 32.1ms -> 32.2ms [-2.03%, +2.55%]
  • 🚀001_tpch_12: 27.5ms -> 26.8ms [-5.59%, -0.16%]
  • ✔️001_tpch_13: 25.3ms -> 25.3ms [-3.38%, +2.99%]
  • ✔️001_tpch_14: 20.2ms -> 20.1ms [-2.25%, +1.89%]
  • ✔️001_tpch_15: 31.4ms -> 31.3ms [-1.7%, +1.3%]
  • ✔️001_tpch_16: 39.5ms -> 39.8ms [-0.8%, +2.33%]
  • ✔️001_tpch_17: 25.9ms -> 26.2ms [-0.51%, +3.11%]
  • ✔️001_tpch_18: 22.1ms -> 22.1ms [-1.79%, +2.02%]
  • ✔️001_tpch_19: 66.1ms -> 66ms [-0.91%, +0.61%]
  • ✔️001_tpch_20: 50.5ms -> 50.2ms [-1.67%, +0.2%]
  • ✔️001_tpch_21: 78.4ms -> 77.9ms [-1.48%, +0.17%]
  • ✔️001_tpch_22: 66.2ms -> 66.4ms [-0.44%, +1.22%]
  • ✔️010_tpch_01: 84.6ms -> 81.2ms [-13.49%, +5.41%]
  • ✔️010_tpch_02: 73.1ms -> 74ms [-1.08%, +3.6%]
  • ✔️010_tpch_03: 60.8ms -> 60.2ms [-3.92%, +2.07%]
  • ✔️010_tpch_04: 43.3ms -> 43.9ms [-1.2%, +4.4%]
  • ✔️010_tpch_05: 90ms -> 89.4ms [-2.11%, +0.62%]
  • ✔️010_tpch_06: 31.7ms -> 32.2ms [-3.71%, +6.91%]
  • ✔️010_tpch_07: 106ms -> 106ms [-1.18%, +1.43%]
  • ✔️010_tpch_08: 128ms -> 128ms [-3.8%, +3.52%]
  • ✔️010_tpch_09: 116ms -> 116ms [-2.75%, +2.65%]
  • ✔️010_tpch_10: 76.3ms -> 73.7ms [-7.74%, +0.81%]
  • ✔️010_tpch_11: 37.9ms -> 37.2ms [-4.31%, +0.6%]
  • ✔️010_tpch_12: 57ms -> 57.2ms [-2.55%, +3.24%]
  • ✔️010_tpch_13: 52.5ms -> 53.9ms [-6.54%, +11.55%]
  • ✔️010_tpch_14: 37.1ms -> 37ms [-1.25%, +0.87%]
  • ✔️010_tpch_15: 55.4ms -> 54.3ms [-7.48%, +3.77%]
  • ✔️010_tpch_16: 44ms -> 43.6ms [-3.39%, +1.75%]
  • ✔️010_tpch_17: 55.2ms -> 53.6ms [-7.32%, +1.75%]
  • ✔️010_tpch_18: 53.9ms -> 52.9ms [-8.62%, +4.92%]
  • ✔️010_tpch_19: 116ms -> 116ms [-1.97%, +1.38%]
  • ✔️010_tpch_20: 64.9ms -> 65.7ms [-1.23%, +3.62%]
  • ✔️010_tpch_21: 240ms -> 236ms [-5.06%, +1.48%]
  • ✔️010_tpch_22: 75.6ms -> 74.8ms [-2.52%, +0.41%]
  • ✔️100_tpch_01: 333ms -> 334ms [-10.04%, +10.25%]
  • ✔️100_tpch_02: 130ms -> 129ms [-10.7%, +8.24%]
  • ✔️100_tpch_03: 189ms -> 185ms [-10.6%, +6.16%]
  • ✔️100_tpch_04: 162ms -> 151ms [-15.63%, +2.64%]
  • ✔️100_tpch_05: 278ms -> 283ms [-10.74%, +14.41%]
  • ✔️100_tpch_06: 106ms -> 105ms [-26.42%, +24.25%]
  • ✔️100_tpch_07: 249ms -> 251ms [-2.61%, +4.56%]
  • ✔️100_tpch_08: 266ms -> 252ms [-17.78%, +7.16%]
  • ✔️100_tpch_09: 361ms -> 375ms [-7.89%, +15.73%]
  • ✔️100_tpch_10: 217ms -> 224ms [-6.92%, +13.17%]
  • ✔️100_tpch_11: 84.7ms -> 93ms [-18.46%, +38.2%]
  • ✔️100_tpch_12: 195ms -> 195ms [-14.62%, +14.62%]
  • ✔️100_tpch_13: 340ms -> 333ms [-10.06%, +5.95%]
  • ✔️100_tpch_14: 122ms -> 120ms [-14.48%, +11.64%]
  • ✔️100_tpch_15: 221ms -> 214ms [-14.17%, +7.74%]
  • ✔️100_tpch_16: 125ms -> 129ms [-8.52%, +13.95%]
  • ✔️100_tpch_17: 176ms -> 174ms [-14.67%, +13.01%]
  • ✔️100_tpch_18: 198ms -> 209ms [-2.88%, +14.4%]
  • ✔️100_tpch_19: 302ms -> 296ms [-16.57%, +12.94%]
  • ✔️100_tpch_20: 191ms -> 182ms [-17.44%, +7.63%]
  • ✔️100_tpch_21: 1.33s -> 1.3s [-5.2%, +1.08%]
  • ✔️100_tpch_22: 174ms -> 171ms [-4.56%, +1.53%]

Further explanation regarding interpretation and methodology can be found in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show source when trying to materialize a lazy data frame
1 participant