-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: simplify boolean expression rules #3731
Conversation
CodSpeed Performance ReportMerging #3731 will degrade performances by 12.28%Comparing Summary
Benchmarks breakdown
|
// e IN () --> false | ||
Expr::IsIn(_, list) if list.is_empty() => Transformed::yes(lit(false)), | ||
// CAST(e AS dtype) -> e if e.dtype == dtype | ||
Expr::Cast(e, dtype) if e.get_type(schema)? == *dtype => Transformed::yes(e.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of erroring out, should we instead return Transformed::no
if we can't get_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there should be a reason why get_type should fail for a valid expression, right?
vec![Box::new(SimplifyExpressionsRule::new())], | ||
RuleExecutionStrategy::FixedPoint(Some(3)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since exprs can be pretty poorly written due to generated code, etc. I'm wondering if we should do more than 3 passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna just use the configurable default_max_optimizer_passes
then.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3731 +/- ##
==========================================
- Coverage 77.57% 77.17% -0.41%
==========================================
Files 728 731 +3
Lines 92054 93132 +1078
==========================================
+ Hits 71408 71871 +463
- Misses 20646 21261 +615
|
for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() { | ||
for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level question: is there ever a case where we want to self-eliminate left exprs with left exprs (that are not itself)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM
for (i, left_exprs) in left_and_of_or_exprs.iter().enumerate() { | ||
for (j, right_exprs) in right_and_of_or_exprs.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. Since the optimizer rule is applied bottom up, we've already self-eliminated. LGTM
This PR adds the following:
simplify
now recursively simplifies the entire expression treesimplify_and
andsimplify_or
to simplify boolean expressionsI also did some refactoring work that did not change behavior:
Arc::unwrap_or_clone
Combined with #3728, this PR brings major performance improvements to TPCH Q19 by pushing down and simplifying filter predicates.