Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

victormorenoi · 2025-02-19T16:06:03Z

Describe the bug
When using {} to reference a dimension in Cube with multi_stage: true, the generated query introduces an unnecessary GROUP BY, which alters the final aggregation results.

To Reproduce
I have the following dimension definitions:

- name: count  
  type: sum  
  sql: "{multibrand_weight}"  

- name: count_total  
  type: sum  
  multi_stage: true  
  sql: "{count}"  
  group_by:  
    - retailer_name  
    - category_name

If I define count as sql: multibrand_weight (without {}), referencing the table column directly, the query works correctly, applying the aggregation directly at the table level.
However, when using sql: "{multibrand_weight}", referencing the Cube dimensions, Cube first executes a SELECT on the dimension and introduces an unnecessary GROUP BY that changes the final result.

Expected behavior
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.

Example: beginning of the Queries:
✅ Correct (without {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    sum("test_cube".multibrand_weight) "test_cube__count"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

❌ Incorrect (with {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    "test_cube"."MULTIBRAND_WEIGHT" "test_cube__multibrand_weight"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1, 2  
  ORDER BY 1 ASC  
),  
cte_1 AS (  
  SELECT  
    "test_cube__brand_name",  
    sum("test_cube__multibrand_weight") "test_cube__count"  
  FROM cte_0  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

The GROUP BY performed in the initial SELECT causes the data to be grouped as if applying a DISTINCT, leading to incorrect final results.
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.
I get the following questions

Is this expected behavior when referencing dimensions using {}?
Is there a way to avoid the extra GROUP BY while still referencing the dimension?
What would be the best practice to define a dimension that depends on another without altering the query logic?

This question is crucial for us, as our modeling heavily relies on multi_stage, both for total calculations and for time_shift usage. The same issue occurs with any measure type, including count, avg, sum, etc.
Thank you so much! 🙌

Version:
[Latest 1.2.5]

The text was updated successfully, but these errors were encountered:

victormorenoi changed the title ~~Unexpected GROUP BY when using multi_stage: true and referencing measures with {} and~~ Unexpected GROUP BY when using multi_stage and referencing measures with {} and Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

victormorenoi commented Feb 19, 2025 •

edited

Loading

Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

Unexpected GROUP BY when using multi_stage and referencing measures with {} and #9241

Comments

victormorenoi commented Feb 19, 2025 • edited Loading

victormorenoi commented Feb 19, 2025 •

edited

Loading