Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: simplify boolean expression rules #3731

Merged
merged 9 commits into from
Jan 29, 2025
Merged

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Jan 28, 2025

This PR adds the following:

  1. simplify now recursively simplifies the entire expression tree
  2. Simplification rules simplify_and and simplify_or to simplify boolean expressions
  3. A simplify expressions step to the logical optimizer before scan task materialization

I also did some refactoring work that did not change behavior:

  1. Converted various functions to take in Arc instead of T, which is more ergonomic to use and avoids a potential clone when calling it with an Arc variable using Arc::unwrap_or_clone
  2. Split up simplify code into multiple files

Combined with #3728, this PR brings major performance improvements to TPCH Q19 by pushing down and simplifying filter predicates.

Copy link

codspeed-hq bot commented Jan 28, 2025

CodSpeed Performance Report

Merging #3731 will degrade performances by 12.28%

Comparing kevin/simplify-bool-expr (da200f7) with main (d00e444)

Summary

⚡ 3 improvements
❌ 1 regressions
✅ 23 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_count[1 Small File] 4 ms 3.5 ms +13.89%
test_iter_rows_first_row[100 Small Files] 319.9 ms 245.6 ms +30.26%
test_show[100 Small Files] 23.8 ms 16.2 ms +47.37%
test_tpch_sql[1-in-memory-native-8] 134.6 ms 153.4 ms -12.28%

// e IN () --> false
Expr::IsIn(_, list) if list.is_empty() => Transformed::yes(lit(false)),
// CAST(e AS dtype) -> e if e.dtype == dtype
Expr::Cast(e, dtype) if e.get_type(schema)? == *dtype => Transformed::yes(e.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of erroring out, should we instead return Transformed::no if we can't get_type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be a reason why get_type should fail for a valid expression, right?

Comment on lines 139 to 140
vec![Box::new(SimplifyExpressionsRule::new())],
RuleExecutionStrategy::FixedPoint(Some(3)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since exprs can be pretty poorly written due to generated code, etc. I'm wondering if we should do more than 3 passes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna just use the configurable default_max_optimizer_passes then.

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 92.10526% with 33 lines in your changes missing coverage. Please review.

Project coverage is 77.17%. Comparing base (d00e444) to head (da200f7).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-logical-plan/src/treenode.rs 87.38% 14 Missing ⚠️
src/daft-algebra/src/simplify/null.rs 73.33% 8 Missing ⚠️
src/daft-algebra/src/simplify/boolean.rs 96.11% 7 Missing ⚠️
src/daft-algebra/src/simplify/numeric.rs 87.09% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3731      +/-   ##
==========================================
- Coverage   77.57%   77.17%   -0.41%     
==========================================
  Files         728      731       +3     
  Lines       92054    93132    +1078     
==========================================
+ Hits        71408    71871     +463     
- Misses      20646    21261     +615     
Files with missing lines Coverage Δ
src/common/treenode/src/lib.rs 94.22% <100.00%> (-0.07%) ⬇️
src/daft-algebra/src/simplify/mod.rs 100.00% <100.00%> (ø)
...rc/daft-logical-plan/src/optimization/optimizer.rs 95.03% <100.00%> (+0.06%) ⬆️
...lan/src/optimization/rules/simplify_expressions.rs 92.18% <100.00%> (-0.57%) ⬇️
src/daft-algebra/src/simplify/numeric.rs 87.09% <87.09%> (ø)
src/daft-algebra/src/simplify/boolean.rs 96.11% <96.11%> (ø)
src/daft-algebra/src/simplify/null.rs 73.33% <73.33%> (ø)
src/daft-logical-plan/src/treenode.rs 91.39% <87.38%> (-1.37%) ⬇️

... and 17 files with indirect coverage changes

Comment on lines +95 to +96
for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() {
for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question: is there ever a case where we want to self-eliminate left exprs with left exprs (that are not itself)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM

Comment on lines +95 to +96
for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() {
for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM

@kevinzwang kevinzwang enabled auto-merge (squash) January 29, 2025 20:04
@kevinzwang kevinzwang disabled auto-merge January 29, 2025 20:05
@kevinzwang kevinzwang enabled auto-merge (squash) January 29, 2025 20:05
@kevinzwang kevinzwang merged commit 78beff4 into main Jan 29, 2025
40 of 41 checks passed
@kevinzwang kevinzwang deleted the kevin/simplify-bool-expr branch January 29, 2025 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants