You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, when we do state_agg, we limit what we can calculate, namely, we only calculate the duration in a given state, while this is nice, for many types of, say, machine metrics, you want to know the actual periods that you were in a state, with all duplicate values of the state reporting removed and only the state change events captured. So you could then easily return all the periods where you were in a given state.
So, we'd want a couple new accessors: state_periods(state_agg, 'state') -> setof state, tstzrange and the related interpolated accessor.
I wonder also if it's useful to have one that can fetch all periods, so something like state_timeline(state_agg) -> setof state, tstzrange
one open question is whether we want to use tstzranges or if we want to use start and end times, I can't remember if/how we've made that choice in the past.
While there are some memory concerns with the current implementation of state_agg and specifically around using text or integers, I think we could support both, I think if we want to limit memory usage, we could just set a limit on the total number of states we can track, probably something like 100 or 1000 or the like, and error if we have more than that. This function is meant for tracking a small number of distinct states, not for tracking a whole lot of long tail data. It's also relatively easy to write a query that specifies the allowed states to pass in to something like this, so if you're running into problems with messy data you can always do something like CASE WHEN state IN ('list', 'of', 'allowed', 'states') THEN state ELSE 'UNKNOWN_STATE' or the like.
I wonder if we want to / can explicitly support ENUM types as well? use them in their int forms
Another question is whether we want to try to deal with the fact that if states are oscillating quickly you can end up with a lot of data in one of these aggregates. Perhaps we could add a flag for the minimum time in a state to track? And could keep the duration counters as well as the larger ranges and some utility functions for figuring out if you're getting lots of oscillations? Or maybe we just wait for it to become a problem and don't bother with it for now. I do wonder whether it's worth it to store the duration anyway to make the calculations faster?
The text was updated successfully, but these errors were encountered:
Supporting ENUM types would cause problems when an enum is the return value. For state_agg() /duration_in()/interpolated_duration_in() we would just need to alter them to take an ANYELEMENT (or we could create one function for TEXT inputs and one for ANYENUM inputs).
But into_values(), which currently has a type signature of into_values(agg StateAgg) RETURNS (TEXT, BIGINT), wouldn't as simple to add support to since you can't return an ANYELEMENT without an ANYELEMENT input. We'd have to use the ugly hack where you pass a NULL of the type you want, like into_values(agg, NULL::myenum).
636: Add timeline_agg r=Smittyvb a=Smittyvb
Adds a `timeline_agg` aggregate that can be used in the same ways as `state_agg` but also allows getting the entire state timeline with `state_timeline`.
Note that this PR returns start times and end times instead of `tstzrange`s. pgx 0.6.0 will add support for `tstzrange`s so when that release comes out I'll update the relevant functions to return ranges.
This partially addresses #619 (adding `rollup` and supporting integer values will be separate PRs to make them easier to review).
Co-authored-by: Smitty <smitty@timescale.com>
Co-authored-by: Smittyvb <smitty@timescale.com>
Right now, when we do state_agg, we limit what we can calculate, namely, we only calculate the duration in a given state, while this is nice, for many types of, say, machine metrics, you want to know the actual periods that you were in a state, with all duplicate values of the state reporting removed and only the state change events captured. So you could then easily return all the periods where you were in a given state.
So, we'd want a couple new accessors:
state_periods(state_agg, 'state') -> setof state, tstzrange
and the related interpolated accessor.I wonder also if it's useful to have one that can fetch all periods, so something like
state_timeline(state_agg) -> setof state, tstzrange
one open question is whether we want to use tstzranges or if we want to use start and end times, I can't remember if/how we've made that choice in the past.
While there are some memory concerns with the current implementation of state_agg and specifically around using text or integers, I think we could support both, I think if we want to limit memory usage, we could just set a limit on the total number of states we can track, probably something like 100 or 1000 or the like, and error if we have more than that. This function is meant for tracking a small number of distinct states, not for tracking a whole lot of long tail data. It's also relatively easy to write a query that specifies the allowed states to pass in to something like this, so if you're running into problems with messy data you can always do something like
CASE WHEN state IN ('list', 'of', 'allowed', 'states') THEN state ELSE 'UNKNOWN_STATE'
or the like.I wonder if we want to / can explicitly support ENUM types as well? use them in their int forms
Another question is whether we want to try to deal with the fact that if states are oscillating quickly you can end up with a lot of data in one of these aggregates. Perhaps we could add a flag for the minimum time in a state to track? And could keep the duration counters as well as the larger ranges and some utility functions for figuring out if you're getting lots of oscillations? Or maybe we just wait for it to become a problem and don't bother with it for now. I do wonder whether it's worth it to store the duration anyway to make the calculations faster?
The text was updated successfully, but these errors were encountered: