perf: simplify boolean expression rules #3731

kevinzwang · 2025-01-28T00:30:23Z

This PR adds the following:

simplify now recursively simplifies the entire expression tree
Simplification rules simplify_and and simplify_or to simplify boolean expressions
A simplify expressions step to the logical optimizer before scan task materialization

I also did some refactoring work that did not change behavior:

Converted various functions to take in Arc instead of T, which is more ergonomic to use and avoids a potential clone when calling it with an Arc variable using Arc::unwrap_or_clone
Split up simplify code into multiple files

Combined with #3728, this PR brings major performance improvements to TPCH Q19 by pushing down and simplifying filter predicates.

codspeed-hq · 2025-01-28T01:28:01Z

CodSpeed Performance Report

Merging #3731 will degrade performances by 12.28%

_{Comparing kevin/simplify-bool-expr (da200f7) with main (d00e444)}

Summary

⚡ 3 improvements
❌ 1 regressions
✅ 23 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_count[1 Small File]`	4 ms	3.5 ms	+13.89%
⚡	`test_iter_rows_first_row[100 Small Files]`	319.9 ms	245.6 ms	+30.26%
⚡	`test_show[100 Small Files]`	23.8 ms	16.2 ms	+47.37%
❌	`test_tpch_sql[1-in-memory-native-8]`	134.6 ms	153.4 ms	-12.28%

universalmind303 · 2025-01-28T16:43:19Z

src/daft-algebra/src/simplify/mod.rs

+        // e IN () --> false
+        Expr::IsIn(_, list) if list.is_empty() => Transformed::yes(lit(false)),
+        // CAST(e AS dtype) -> e if e.dtype == dtype
+        Expr::Cast(e, dtype) if e.get_type(schema)? == *dtype => Transformed::yes(e.clone()),


instead of erroring out, should we instead return Transformed::no if we can't get_type?

I don't think there should be a reason why get_type should fail for a valid expression, right?

src/daft-algebra/src/simplify/numeric.rs

universalmind303 · 2025-01-28T16:47:22Z

src/daft-logical-plan/src/optimization/optimizer.rs

+                    vec![Box::new(SimplifyExpressionsRule::new())],
+                    RuleExecutionStrategy::FixedPoint(Some(3)),


Since exprs can be pretty poorly written due to generated code, etc. I'm wondering if we should do more than 3 passes.

Gonna just use the configurable default_max_optimizer_passes then.

codecov · 2025-01-28T20:56:25Z

Codecov Report

Attention: Patch coverage is 92.10526% with 33 lines in your changes missing coverage. Please review.

Project coverage is 77.17%. Comparing base (d00e444) to head (da200f7).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-logical-plan/src/treenode.rs	87.38%	14 Missing ⚠️
src/daft-algebra/src/simplify/null.rs	73.33%	8 Missing ⚠️
src/daft-algebra/src/simplify/boolean.rs	96.11%	7 Missing ⚠️
src/daft-algebra/src/simplify/numeric.rs	87.09%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3731      +/-   ##
==========================================
- Coverage   77.57%   77.17%   -0.41%     
==========================================
  Files         728      731       +3     
  Lines       92054    93132    +1078     
==========================================
+ Hits        71408    71871     +463     
- Misses      20646    21261     +615

Files with missing lines	Coverage Δ
src/common/treenode/src/lib.rs	`94.22% <100.00%> (-0.07%)`	⬇️
src/daft-algebra/src/simplify/mod.rs	`100.00% <100.00%> (ø)`
...rc/daft-logical-plan/src/optimization/optimizer.rs	`95.03% <100.00%> (+0.06%)`	⬆️
...lan/src/optimization/rules/simplify_expressions.rs	`92.18% <100.00%> (-0.57%)`	⬇️
src/daft-algebra/src/simplify/numeric.rs	`87.09% <87.09%> (ø)`
src/daft-algebra/src/simplify/boolean.rs	`96.11% <96.11%> (ø)`
src/daft-algebra/src/simplify/null.rs	`73.33% <73.33%> (ø)`
src/daft-logical-plan/src/treenode.rs	`91.39% <87.38%> (-1.37%)`	⬇️

... and 17 files with indirect coverage changes

desmondcheongzx · 2025-01-29T19:47:38Z

src/daft-algebra/src/simplify/boolean.rs

+    for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() {
+        for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() {


High level question: is there ever a case where we want to self-eliminate left exprs with left exprs (that are not itself)?

Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM

src/daft-algebra/src/simplify/mod.rs

desmondcheongzx · 2025-01-29T19:54:50Z

src/daft-algebra/src/simplify/boolean.rs

+    for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() {
+        for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() {


Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM

perf: simplify boolean expression rules

fcd803a

kevinzwang requested review from desmondcheongzx and universalmind303 January 28, 2025 00:30

github-actions bot added the perf label Jan 28, 2025

kevinzwang added 2 commits January 27, 2025 16:43

simplify is_true is_false

a22a50b

fix

1a2c8f5

universalmind303 reviewed Jan 28, 2025

View reviewed changes

kevinzwang added 3 commits January 28, 2025 11:49

fix schema issues

c392b3c

fixes

2d3ae2f

Merge branch 'main' into kevin/simplify-bool-expr

2319733

remove partition filter simplification

057f661

desmondcheongzx reviewed Jan 29, 2025

View reviewed changes

Merge branch 'main' into kevin/simplify-bool-expr

6a2a82c

desmondcheongzx approved these changes Jan 29, 2025

View reviewed changes

add comments

da200f7

kevinzwang enabled auto-merge (squash) January 29, 2025 20:04

kevinzwang disabled auto-merge January 29, 2025 20:05

kevinzwang enabled auto-merge (squash) January 29, 2025 20:05

kevinzwang merged commit 78beff4 into main Jan 29, 2025
40 of 41 checks passed

kevinzwang deleted the kevin/simplify-bool-expr branch January 29, 2025 20:25

colin-ho mentioned this pull request Feb 13, 2025

Optimizer: Eliminate common expressions in OR conjunctions #3664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: simplify boolean expression rules #3731

perf: simplify boolean expression rules #3731

kevinzwang commented Jan 28, 2025 •

edited

Loading

codspeed-hq bot commented Jan 28, 2025 •

edited

Loading

universalmind303 Jan 28, 2025

kevinzwang Jan 28, 2025

universalmind303 Jan 28, 2025

kevinzwang Jan 28, 2025

codecov bot commented Jan 28, 2025 •

edited

Loading

desmondcheongzx Jan 29, 2025

desmondcheongzx Jan 29, 2025

desmondcheongzx Jan 29, 2025

		vec![Box::new(SimplifyExpressionsRule::new())],
		RuleExecutionStrategy::FixedPoint(Some(3)),

		for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() {
		for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() {

perf: simplify boolean expression rules #3731

perf: simplify boolean expression rules #3731

Conversation

kevinzwang commented Jan 28, 2025 • edited Loading

codspeed-hq bot commented Jan 28, 2025 • edited Loading

Merging #3731 will degrade performances by 12.28%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 28, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinzwang commented Jan 28, 2025 •

edited

Loading

codspeed-hq bot commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading