Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: to_cnf and to_dnf functions #3728

Merged
merged 5 commits into from
Jan 28, 2025
Merged

fix: to_cnf and to_dnf functions #3728

merged 5 commits into from
Jan 28, 2025

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Jan 27, 2025

The pre-order expression tree traversal algorithm I used for to_dnf turned out not to actually fully convert a statement to disjunctive normal form. Since it was also used in to_cnf, that did not fully work either.

This new algorithm recursively converts subtrees to DNF form and propagates that upwards. In each boolean AND, the subtrees are assumed to be fully DNF, and it uses that assumption to merge the two sides into one DNF statement.

Also a quick fix for de Morgan's law on the case of triple negatives.

Copy link

codspeed-hq bot commented Jan 27, 2025

CodSpeed Performance Report

Merging #3728 will degrade performances by 36.53%

Comparing kevin/to-dnf-fix (5ae59bf) with main (fec399e)

Summary

❌ 2 regressions
✅ 25 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_iter_rows_first_row[100 Small Files] 188.1 ms 296.4 ms -36.53%
test_show[100 Small Files] 15.9 ms 23.9 ms -33.64%

Copy link

codecov bot commented Jan 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.58%. Comparing base (603199f) to head (5ae59bf).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3728      +/-   ##
==========================================
+ Coverage   77.55%   77.58%   +0.03%     
==========================================
  Files         729      729              
  Lines       91917    92005      +88     
==========================================
+ Hits        71284    71383      +99     
+ Misses      20633    20622      -11     
Files with missing lines Coverage Δ
src/daft-algebra/src/boolean.rs 100.00% <100.00%> (ø)
...al-plan/src/optimization/rules/push_down_filter.rs 97.35% <ø> (ø)

... and 5 files with indirect coverage changes

Copy link
Contributor

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, but i'd prefer to merge the mod/tests into single file.

@kevinzwang kevinzwang enabled auto-merge (squash) January 28, 2025 19:55
@kevinzwang kevinzwang merged commit 960f144 into main Jan 28, 2025
42 of 43 checks passed
@kevinzwang kevinzwang deleted the kevin/to-dnf-fix branch January 28, 2025 20:23
kevinzwang added a commit that referenced this pull request Jan 29, 2025
This PR adds the following:
1. `simplify` now recursively simplifies the entire expression tree
2. Simplification rules `simplify_and` and `simplify_or` to simplify
boolean expressions
3. A simplify expressions step to the logical optimizer before scan task
materialization

I also did some refactoring work that did not change behavior:
1. Converted various functions to take in `Arc<T>` instead of `T`, which is
more ergonomic to use and avoids a potential clone when calling it with
an Arc variable using `Arc::unwrap_or_clone`
2. Split up simplify code into multiple files

Combined with #3728, this PR brings major performance improvements to
TPCH Q19 by pushing down and simplifying filter predicates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants