Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TagValues returns wrong with respect to "_field" #19794

Closed
wolffcm opened this issue Oct 21, 2020 · 0 comments · Fixed by #19856
Closed

Using TagValues returns wrong with respect to "_field" #19794

wolffcm opened this issue Oct 21, 2020 · 0 comments · Fixed by #19856

Comments

@wolffcm
Copy link

wolffcm commented Oct 21, 2020

In Flux we use a special storage method called TagValues to allow the user to discover the values of a tag key. Most of the time this works fine, but for a case like this test it does not:

https://github.com/influxdata/flux/blob/d115c75ebd33120b58bce1c697ea1f3df223ce4d/stdlib/influxdata/influxdb/schema/measurement_field_keys_test.flux#L71

We are seeking the fields that fulfill the predicate r._measurement == "sys" and r.host == "host.local". Storage's implementation tagValues will filter the measurements on the predicate (in this case only sys would match) and then return all of the fields in the measurement. So when using storage code to do this we get the four fields

load3
load7
load8
load9

When only load3 and load8 fulfill the r.host == "host.local" part of the predicate.

Note that there are also issues when we are looking for tag values on other tag keys, when there is a predicate in the _field column. Here is an example of that:
https://github.com/influxdata/flux/blob/8b7b2cb5ea6973ad051f1d962d48a528161a9985/stdlib/influxdata/influxdb/schema/show_measurements_with_pred_test.flux
The above test would fail if PushDownReadTagValuesRule was re-enabled.

It doesn't seem to be possible to filter out fields based on predicates on tags using the index. This is at odds with Flux which wants to treat _field as a tag key like any other.

Current Status and User Impact

The current status on OSS 2.0 master is that we just don't push down tagValues or tagKeys operations to storage, so metaqueries will be slow for non-trivial amounts of data. In particular this has a big impact on the UI experience, especially the Query Builder.

Some More Details

Looking at the tag set as it's stored in the index I see this for the Telegraf cpu measurement:

tk, ok := mm.tagSet[string(key)]

mm.tagSet: map[
  cpu:{
    name:[99 112 117]
    deleted:false
    tagValues:map[
      cpu0:{name:[99 112 117 48] deleted:false series:map[481:{}] seriesSet:<nil>}
      cpu2:{name:[99 112 117 50] deleted:false series:map[673:{}] seriesSet:<nil>}
      cpu6:{name:[99 112 117 54] deleted:false series:map[623:{}] seriesSet:<nil>}]}
  host:{
    name:[104 111 115 116]
    deleted:false
    tagValues:map[
      euterpe.local:{name:[101 117 116 101 114 112 101 46 108 111 99 97 108] deleted:false series:map[623:{} 673:{}] seriesSet:<nil>}
      ip-192-168-1-16.ec2.internal:{name:[105 112 45 49 57 50 45 49 54 56 45 49 45 49 54 46 101 99 50 46 105 110 116 101 114 110 97 108] deleted:false series:map[481:{}] seriesSet:<nil>}]}
  ]

Since _field or \0xff is not listed in the tag set, the index doesn't seem to actually store the field names in the index at all.

Proposed Solution

Update the implementation of tagValues to look for conditions that would cause it to get a wrong answer:

  • getting tag values for _field, where there is a predicate on anything except _measurement
  • getting tag values for any tag key (including _measurement, _field or any other tag key) where the predicate references _field

For these cases, we can treat the metaquery more similar to a normal read query, as readFilter does, but make the query into storage only request one point per series. For each series that has a point, collect the field name and then sort and dedupe the list of fields. Then return this field list.

Definition of Done

  • Work described above to fix tagValues is complete
  • Unit tests for this work? Currently there does not seem to be unit tests for tagValues, will end-to-end tests be enough?
  • PushDownReadTagValuesRule is re-enabled in the planner
  • All Flux end-to-end tests pass (more tests are being added to get better coverage here)
@wolffcm wolffcm changed the title Using TagValues to get values of _field with tag predicates returns wrong answer Using TagValues returns wrong with respect to "_field" Oct 22, 2020
@nathanielc nathanielc added this to the Sprint 20-Q4-2 milestone Oct 23, 2020
@wolffcm wolffcm self-assigned this Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants