Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rule common_sub_expression_eliminate removes non-duplicate expressions #4887

Closed
Tracked by #4685
askoa opened this issue Jan 13, 2023 · 2 comments
Closed
Tracked by #4685
Labels
bug Something isn't working

Comments

@askoa
Copy link
Contributor

askoa commented Jan 13, 2023

Describe the bug
While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below subquery disappeared from Logical Plan.

    exists (select * 
            from catalog_sales,date_dim
            where c.c_customer_sk = cs_ship_customer_sk and
                  cs_sold_date_sk = d_date_sk and
                  d_year = 2002 and
                  d_moy between 4 and 4+3)

To Reproduce
Enable log tracing and run the test tpcds_logical_q10 or tpcds_physical_q10. Check the logs for Logical Plan generated.

Expected behavior
The subquery should not disappear from Logical Plan.

Additional context
I did some additional analysis and found that the rule common_sub_expression_eliminate removes one of the EXIST (<subquery>). Below are detailed steps

logical plan before decorrelate_where_exists ->

...
Filter: ... EXISTS(<subquery>) AND (EXISTS(<subquery>) OR EXISTS(<subquery>))
...

the rule decorrelate_where_exists mutates the first EXIST into a LeftSemi Join

logical plan after decorrelate_where_exists and before common_sub_expression_eliminate ->

...
Filter: ... (EXISTS(<subquery>) OR EXISTS(<subquery>))
    LeftSemi Join: ...
...

the rule common_sub_expression_eliminate runs:

  • The rule maintains a HashMap of expressions to no. of occurrences. The expression EXISTS(<subquery>) is recognized as a single expression occurring twice.

https://github.com/apache/arrow-datafusion/blob/1844d39eb92f04e483095f491ff07da3a2f67f25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L451-L454

  • The rule creates a projection that merges the two similar expressions and thus making the second subquery irrelevant.

logical plan after common_sub_expression_eliminate:

...
Filter ...(EXISTS(<subquery>) OR EXISTS(<subquery>))...
    Projection: EXISTS<subquery>... 
        LefftSemi Join: ...
    LeftSemi Join: ...
...
@alamb
Copy link
Contributor

alamb commented Sep 28, 2024

I believe we have fixed this as part of the extensive common subexpr refactoring. Can someone check if this is still an issue?

@alamb
Copy link
Contributor

alamb commented Sep 28, 2024

Please reopen if this is still an issue

@alamb alamb closed this as completed Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants