You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Flux we use a special storage method called TagValues to allow the user to discover the values of a tag key. Most of the time this works fine, but for a case like this test it does not:
We are seeking the fields that fulfill the predicate r._measurement == "sys" and r.host == "host.local". Storage's implementation tagValues will filter the measurements on the predicate (in this case only sys would match) and then return all of the fields in the measurement. So when using storage code to do this we get the four fields
load3
load7
load8
load9
When only load3 and load8 fulfill the r.host == "host.local" part of the predicate.
It doesn't seem to be possible to filter out fields based on predicates on tags using the index. This is at odds with Flux which wants to treat _field as a tag key like any other.
Current Status and User Impact
The current status on OSS 2.0 master is that we just don't push down tagValues or tagKeys operations to storage, so metaqueries will be slow for non-trivial amounts of data. In particular this has a big impact on the UI experience, especially the Query Builder.
Some More Details
Looking at the tag set as it's stored in the index I see this for the Telegraf cpu measurement:
Since _field or \0xff is not listed in the tag set, the index doesn't seem to actually store the field names in the index at all.
Proposed Solution
Update the implementation of tagValues to look for conditions that would cause it to get a wrong answer:
getting tag values for _field, where there is a predicate on anything except _measurement
getting tag values for any tag key (including _measurement, _field or any other tag key) where the predicate references _field
For these cases, we can treat the metaquery more similar to a normal read query, as readFilter does, but make the query into storage only request one point per series. For each series that has a point, collect the field name and then sort and dedupe the list of fields. Then return this field list.
Definition of Done
Work described above to fix tagValues is complete
Unit tests for this work? Currently there does not seem to be unit tests for tagValues, will end-to-end tests be enough?
PushDownReadTagValuesRule is re-enabled in the planner
All Flux end-to-end tests pass (more tests are being added to get better coverage here)
The text was updated successfully, but these errors were encountered:
wolffcm
changed the title
Using TagValues to get values of _field with tag predicates returns wrong answer
Using TagValues returns wrong with respect to "_field"
Oct 22, 2020
In Flux we use a special storage method called
TagValues
to allow the user to discover the values of a tag key. Most of the time this works fine, but for a case like this test it does not:https://github.com/influxdata/flux/blob/d115c75ebd33120b58bce1c697ea1f3df223ce4d/stdlib/influxdata/influxdb/schema/measurement_field_keys_test.flux#L71
We are seeking the fields that fulfill the predicate
r._measurement == "sys" and r.host == "host.local"
. Storage's implementationtagValues
will filter the measurements on the predicate (in this case onlysys
would match) and then return all of the fields in the measurement. So when using storage code to do this we get the four fieldsWhen only
load3
andload8
fulfill ther.host == "host.local"
part of the predicate.Note that there are also issues when we are looking for tag values on other tag keys, when there is a predicate in the
_field
column. Here is an example of that:https://github.com/influxdata/flux/blob/8b7b2cb5ea6973ad051f1d962d48a528161a9985/stdlib/influxdata/influxdb/schema/show_measurements_with_pred_test.flux
The above test would fail if
PushDownReadTagValuesRule
was re-enabled.It doesn't seem to be possible to filter out fields based on predicates on tags using the index. This is at odds with Flux which wants to treat
_field
as a tag key like any other.Current Status and User Impact
The current status on OSS 2.0 master is that we just don't push down
tagValues
ortagKeys
operations to storage, so metaqueries will be slow for non-trivial amounts of data. In particular this has a big impact on the UI experience, especially the Query Builder.Some More Details
Looking at the tag set as it's stored in the index I see this for the Telegraf
cpu
measurement:influxdb/tsdb/index/tsi1/log_file.go
Line 402 in ca2055c
Since
_field
or\0xff
is not listed in the tag set, the index doesn't seem to actually store the field names in the index at all.Proposed Solution
Update the implementation of
tagValues
to look for conditions that would cause it to get a wrong answer:_field
, where there is a predicate on anything except_measurement
_measurement
,_field
or any other tag key) where the predicate references_field
For these cases, we can treat the metaquery more similar to a normal read query, as
readFilter
does, but make the query into storage only request one point per series. For each series that has a point, collect the field name and then sort and dedupe the list of fields. Then return this field list.Definition of Done
tagValues
is completetagValues
, will end-to-end tests be enough?PushDownReadTagValuesRule
is re-enabled in the plannerThe text was updated successfully, but these errors were encountered: