You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below subquery disappeared from Logical Plan.
exists (select *
from catalog_sales,date_dim
where c.c_customer_sk = cs_ship_customer_sk and
cs_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 4 and 4+3)
To Reproduce
Enable log tracing and run the test tpcds_logical_q10 or tpcds_physical_q10. Check the logs for Logical Plan generated.
Expected behavior
The subquery should not disappear from Logical Plan.
Additional context
I did some additional analysis and found that the rule common_sub_expression_eliminate removes one of the EXIST (<subquery>). Below are detailed steps
logical plan before decorrelate_where_exists ->
...
Filter: ... EXISTS(<subquery>) AND (EXISTS(<subquery>) OR EXISTS(<subquery>))
...
the rule decorrelate_where_exists mutates the first EXIST into a LeftSemi Join
logical plan after decorrelate_where_exists and before common_sub_expression_eliminate ->
...
Filter: ... (EXISTS(<subquery>) OR EXISTS(<subquery>))
LeftSemi Join: ...
...
the rule common_sub_expression_eliminate runs:
The rule maintains a HashMap of expressions to no. of occurrences. The expression EXISTS(<subquery>) is recognized as a single expression occurring twice.
Describe the bug
While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below subquery disappeared from Logical Plan.
To Reproduce
Enable log tracing and run the test
tpcds_logical_q10
ortpcds_physical_q10
. Check the logs for Logical Plan generated.Expected behavior
The subquery should not disappear from Logical Plan.
Additional context
I did some additional analysis and found that the rule
common_sub_expression_eliminate
removes one of theEXIST (<subquery>)
. Below are detailed stepslogical plan before
decorrelate_where_exists
->the rule
decorrelate_where_exists
mutates the firstEXIST
into aLeftSemi Join
logical plan after
decorrelate_where_exists
and beforecommon_sub_expression_eliminate
->the rule
common_sub_expression_eliminate
runs:EXISTS(<subquery>)
is recognized as a single expression occurring twice.https://github.com/apache/arrow-datafusion/blob/1844d39eb92f04e483095f491ff07da3a2f67f25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L451-L454
logical plan after
common_sub_expression_eliminate
:The text was updated successfully, but these errors were encountered: