-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds MAX and COUNT DISTINCT aggregations to mean Fact Metrics #3384
Conversation
Deploy preview for docs ready! ✅ Preview Built with commit 4213412. |
} | ||
abstract hllAggregate(col: string): string; | ||
abstract hllReaggregate(col: string): string; | ||
abstract hllCardinality(col: string): string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would default hasCountDistinctHLL
to return false and instead of abstract methods, implement default ones here that throw. So basically the same thing you're doing in Mysql, etc.. Then those data sources that don't support this will have no changes.
// TODO test null | ||
return this.hllCardinality(this.hllAggregate(valueColumn)); | ||
} else if (columnRef?.aggregation === "max") { | ||
return `MAX(COALESCE(${valueColumn}, 0))`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will technically break if the column has all negative values with at least 1 null, this will force the max value to be 0
instead of the true max value. Could swap it to COALESCE(MAX(${valueColumn}), 0)
. I don't think this is worth fixing, just calling it out as something interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that is interesting. I think the idea that NULL is 0 always made sense until now. I'll think about this some more.
): string { | ||
if (columnRef?.aggregation === "count distinct") { | ||
return this.hllCardinality(col); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably refactor some of the user filter code to use this since it's similar (selects a SUM or COUNT, but then does a CASE WHEN at the end to get the final value). Nothing needed in this PR, just making a note for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in reality we are measuring the final temp value, which in most cases is just 1<->1 but here it's cardinality of the state and for user filter it's applying the threshold. The reason it didn't work to just duplicate it there is that I need to measure the state value before capping is applied, which isn't an issue for the threshold metrics.
So I either need to carry around the state AND its value, or I need to handle it in logical statements like the above.
Features and Changes
Adds COUNT DISTINCT and MAX to mean fact metrics.
COUNT DISTINCT uses HLL estimation methods (or ClickHouse's
uniq
default estimation method) exclusively and is not available for: non-SQL warehouses (Mixpanel), Postgres, Vertica, MySQL, and MS SQL (SQL SERVER).Does so by adding a new
aggregation
field to column ref, and default isSUM
if undefined, otherwise one of defined aggregations.TODO:
Testing
Tested all but Presto/Athena with dummy queries.
Tested Create and Update API endpoints, as well as errors when trying to use the wrong aggregation.
Screenshots
https://www.loom.com/share/dc78491dbeff475da17d0d197d4eb989