-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: show statistics in explain verbose #8113
Conversation
@alamb : Instead of adding lines with stats, I just add stats to all physical plans which I think more reasonable. Let me know what you think |
I think it makes sense, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NGA-TRAN and @berkaysynnada I think adding statistics to all explain plans is going to make verbose plans almost unreadable (they are already pretty hard to read), especially as we improve the statistics over time -- like imagine a plan with 100 columns after #8112 -- it will be almost impossible to read the verbose output
I would like to suggest rather than adding statistics to every plan in explain verbose, instead add only a single new line
Perhaps by adding a new entry in addition to
if verbose {
// add initial physical plan and its statistics in verbose mode
stringified_plans.push(
displayable(input.as_ref())
.set_show_statistics(true)
.to_stringified(e.verbose, InitialPhysicalPlanWithStats),
);
Something like this perhaps: #8111 (comment) |
Sounds good. I will add stats for initial and final physical plans only |
// We do not want to add proto plan type for these two becasue they are just the same as | ||
// InitialPhysicalPlan and FinalPhysicalPlan | ||
datafusion_expr::PlanType::InitialPhysicalPlanWithStats | ||
| datafusion_expr::PlanType::FinalPhysicalPlanWithStats => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Cargo suggested me to do this. Do you want to go with this or create 2 new types in photo enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding two new types in the proto would be the most consistent with the existing code
However, as long as this doesn't panic I think it would be ok to merge -- for example, if it returned an internal error.
I don't really understand why there is special serialization for this to be honest 🤔 maybe we could simplify that too as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb The todo!
returns a panic of not yet implemented
. Let me try to add new types to proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have add 2 new types to proto instead
set datafusion.execution.collect_statistics = false; | ||
# explain verbose with both collect & show statistics on | ||
query TT | ||
EXPLAIN VERBOSE SELECT * FROM alltypes_plain limit 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All plans have stats because show_stats is on. This works the same as before this PR
|
||
# explain verbose with collect on and & show statistics off: still has stats | ||
query TT | ||
EXPLAIN VERBOSE SELECT * FROM alltypes_plain limit 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 new lines added for this one because show_stats is off
@alamb I have addressed your comments but I have a new question in the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NGA-TRAN -- this looks good to me except for the panic in proto serialization.
// We do not want to add proto plan type for these two becasue they are just the same as | ||
// InitialPhysicalPlan and FinalPhysicalPlan | ||
datafusion_expr::PlanType::InitialPhysicalPlanWithStats | ||
| datafusion_expr::PlanType::FinalPhysicalPlanWithStats => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding two new types in the proto would be the most consistent with the existing code
However, as long as this doesn't panic I think it would be ok to merge -- for example, if it returned an internal error.
I don't really understand why there is special serialization for this to be honest 🤔 maybe we could simplify that too as a follow on PR
@@ -245,6 +245,7 @@ logical_plan after eliminate_projection SAME TEXT AS ABOVE | |||
logical_plan after push_down_limit SAME TEXT AS ABOVE | |||
logical_plan TableScan: simple_explain_test projection=[a, b, c] | |||
initial_physical_plan CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/example.csv]]}, projection=[a, b, c], has_header=true | |||
initial_physical_plan_with_stats CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/example.csv]]}, projection=[a, b, c], has_header=true, statistics=[Rows=Absent, Bytes=Absent] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -1900,6 +1900,18 @@ impl DefaultPhysicalPlanner { | |||
.to_stringified(e.verbose, InitialPhysicalPlan), | |||
); | |||
|
|||
// Add plan with stats for verbose case even if show_statistics is false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems confusing as the next line checks for !config.show_statistics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the comments clearer
Looks like the CI failures are due to a logical merge conflict with #8112 -- I'll push an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @NGA-TRAN !
Which issue does this PR close?
Closes ##8111
Rationale for this change
Always show statistics in explain verbose to get used by other DB such as IOx that do not want to use setting
set datafusion.explain.show_statistics = true;
What changes are included in this PR?
Always show statistics in explain verbose
Are these changes tested?
Yes
Are there any user-facing changes?
Explain verbose now includes statistics