-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push down more predicates into ParquetExec
#4279
Conversation
ParquetExec
a31928f
to
fa1e18d
Compare
@@ -284,6 +288,7 @@ impl ExecutionPlan for ParquetExec { | |||
partition_index, | |||
projection: Arc::from(projection), | |||
batch_size: ctx.session_config().batch_size(), | |||
predicate: self.predicate.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like all the cloning -- I have a follow on PR (#4283) to stop the copying
fa1e18d
to
ec6de25
Compare
@@ -98,6 +100,7 @@ impl ParquetExec { | |||
MetricBuilder::new(&metrics).global_counter("num_predicate_creation_errors"); | |||
|
|||
let pruning_predicate = predicate | |||
.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key difference here is that the original predicate is kept, even if the creation of the pruning predicate fails
@@ -312,6 +317,12 @@ impl ExecutionPlan for ParquetExec { | |||
) -> std::fmt::Result { | |||
match t { | |||
DisplayFormatType::Default => { | |||
let predicate_string = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predicate is now explicitly shown as well in EXPLAIN (as it is difference)
@@ -1564,6 +1575,9 @@ mod tests { | |||
&display, | |||
"pruning_predicate=c1_min@0 != bar OR bar != c1_max@1" | |||
); | |||
|
|||
assert_contains!(&display, r#"predicate=c1 != Utf8("bar")"#); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Thanks for solving this!
Benchmark runs are scheduled for baseline = bcd6248 and contender = afe2333. afe2333 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft as it builds onParquetExec
#4280Which issue does this PR close?
Closes #4020
connects to #3463
I suggest viewing with whitespace blind diff to see the differences clearly: https://github.com/apache/arrow-datafusion/pull/4279/files?w=1
Rationale for this change
Pruning predicates are for eliminating row groups or pages based on min/max values. Only some predicates can be converted into Pruning Predicates.
Filter predicates are evaluated directly on decoded values during the scan.
However, the only predicates that the parquet exec will attempt to push down as filter predicate are the predicates that can be made into pruning predicates. It should attempt to push down all of them.
What changes are included in this PR?
Pass entire pushed down predicate (not just the one attached to pruning predicate)
Are these changes tested?
Yes
Are there any user-facing changes?
More predicates can be pushed as row filters