Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

Open
victormorenoi opened this issue Feb 19, 2025 · 0 comments

Comments

@victormorenoi
Copy link

victormorenoi commented Feb 19, 2025

Describe the bug
When using {} to reference a dimension in Cube with multi_stage: true, the generated query introduces an unnecessary GROUP BY, which alters the final aggregation results.

To Reproduce
I have the following dimension definitions:

- name: count  
  type: sum  
  sql: "{multibrand_weight}"  

- name: count_total  
  type: sum  
  multi_stage: true  
  sql: "{count}"  
  group_by:  
    - retailer_name  
    - category_name

If I define count as sql: multibrand_weight (without {}), referencing the table column directly, the query works correctly, applying the aggregation directly at the table level.
However, when using sql: "{multibrand_weight}", referencing the Cube dimensions, Cube first executes a SELECT on the dimension and introduces an unnecessary GROUP BY that changes the final result.

Expected behavior
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.

Example: beginning of the Queries:
✅ Correct (without {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    sum("test_cube".multibrand_weight) "test_cube__count"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

❌ Incorrect (with {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    "test_cube"."MULTIBRAND_WEIGHT" "test_cube__multibrand_weight"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1, 2  
  ORDER BY 1 ASC  
),  
cte_1 AS (  
  SELECT  
    "test_cube__brand_name",  
    sum("test_cube__multibrand_weight") "test_cube__count"  
  FROM cte_0  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

The GROUP BY performed in the initial SELECT causes the data to be grouped as if applying a DISTINCT, leading to incorrect final results.
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.
I get the following questions

  • Is this expected behavior when referencing dimensions using {}?
  • Is there a way to avoid the extra GROUP BY while still referencing the dimension?
  • What would be the best practice to define a dimension that depends on another without altering the query logic?

This question is crucial for us, as our modeling heavily relies on multi_stage, both for total calculations and for time_shift usage. The same issue occurs with any measure type, including count, avg, sum, etc.
Thank you so much! 🙌

Version:
[Latest 1.2.5]

@victormorenoi victormorenoi changed the title Unexpected GROUP BY when using multi_stage: true and referencing measures with {} and Unexpected GROUP BY when using multi_stage and referencing measures with {} and Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant