Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs and reenable PushDownReadTagValuesRule and PushDownReadTagKeysRule #19561

Closed
wolffcm opened this issue Sep 15, 2020 · 7 comments · Fixed by influxdata/flux#3272
Closed

Comments

@wolffcm
Copy link

wolffcm commented Sep 15, 2020

Once Flux is updated to 0.83.1, after the split, having this rule turned on makes these end-to-end test fail:

  • measurement_field_keys
  • measureemnt_tag_values

When the rules is turned off, the tests can pass. It seems that the predicate that may accompany the call to ReadTagKeys and ReadTagValues is not evaluated properly, especially when there is a predicate on the _field column.

DOD:

  • testing is very thin in this area, and this issue could have easily been missed, so more testing should be added, including tests with predicates on _measurement, _filter and _value for queries that use ReadTagKeys and ReadTagValues, in addition to any regression test
  • Rules should be reenabled in query/stdlib/influxdata/influxdb/rules.go
  • The above mention Flux end-to-end tests should be passing
@wolffcm
Copy link
Author

wolffcm commented Sep 17, 2020

Here's an easy reproducer for this.

Insert data like this:

package main

import "csv"

input = "
#datatype,string,long,dateTime:RFC3339,string,string,string,double
#group,false,false,false,true,true,true,false
#default,_result,,,,,,
,result,table,_time,_measurement,host,_field,_value
,,0,2018-05-22T19:53:26Z,system,host.local,load1,1.83
,,0,2018-05-22T19:53:36Z,system,host.local,load1,1.72
,,0,2018-05-22T19:53:46Z,system,host.local,load1,1.74
,,0,2018-05-22T19:53:56Z,system,host.local,load1,1.63
,,0,2018-05-22T19:54:06Z,system,host.local,load1,1.91
,,0,2018-05-22T19:54:16Z,system,host.local,load1,1.84

,,1,2018-05-22T19:53:26Z,sys,host.local,load3,1.98
,,1,2018-05-22T19:53:36Z,sys,host.local,load3,1.97
,,1,2018-05-22T19:53:46Z,sys,host.local,load3,1.97
,,1,2018-05-22T19:53:56Z,sys,host.local,load3,1.96
,,1,2018-05-22T19:54:06Z,sys,host.local,load3,1.98
,,1,2018-05-22T19:54:16Z,sys,host.local,load3,1.97

,,2,2018-05-22T19:53:26Z,system,host.local,load5,1.95
,,2,2018-05-22T19:53:36Z,system,host.local,load5,1.92
,,2,2018-05-22T19:53:46Z,system,host.local,load5,1.92
,,2,2018-05-22T19:53:56Z,system,host.local,load5,1.89
,,2,2018-05-22T19:54:06Z,system,host.local,load5,1.94
,,2,2018-05-22T19:54:16Z,system,host.local,load5,1.93

,,3,2018-05-22T19:53:26Z,swap,host.global,used_percent,82.98
,,3,2018-05-22T19:53:36Z,swap,host.global,used_percent,82.59
,,3,2018-05-22T19:53:46Z,swap,host.global,used_percent,82.59
,,3,2018-05-22T19:53:56Z,swap,host.global,used_percent,82.59
,,3,2018-05-22T19:54:06Z,swap,host.global,used_percent,82.59
,,3,2018-05-22T19:54:16Z,swap,host.global,used_percent,82.64

#datatype,string,long,dateTime:RFC3339,string,string,string,long
#group,false,false,false,true,true,true,false
#default,_result,,,,,,
,result,table,_time,_measurement,host,_field,_value
,,0,2018-05-22T19:53:26Z,sys,host.global,load7,183
,,0,2018-05-22T19:53:36Z,sys,host.global,load7,172
,,0,2018-05-22T19:53:46Z,sys,host.global,load7,174
,,0,2018-05-22T19:53:56Z,sys,host.global,load7,163
,,0,2018-05-22T19:54:06Z,sys,host.global,load7,191
,,0,2018-05-22T19:54:16Z,sys,host.global,load7,184

,,1,2018-05-22T19:53:26Z,sys,host.local,load8,198
,,1,2018-05-22T19:53:36Z,sys,host.local,load8,197
,,1,2018-05-22T19:53:46Z,sys,host.local,load8,197
,,1,2018-05-22T19:53:56Z,sys,host.local,load8,196
,,1,2018-05-22T19:54:06Z,sys,host.local,load8,198
,,1,2018-05-22T19:54:16Z,sys,host.local,load8,197

,,2,2018-05-22T19:53:26Z,sys,host.global,load9,195
,,2,2018-05-22T19:53:36Z,sys,host.global,load9,192
,,2,2018-05-22T19:53:46Z,sys,host.global,load9,192
,,2,2018-05-22T19:53:56Z,sys,host.global,load9,189
,,2,2018-05-22T19:54:06Z,sys,host.global,load9,194
,,2,2018-05-22T19:54:16Z,sys,host.global,load9,193

,,3,2018-05-22T19:53:26Z,swp,host.global,used_percent,8298
,,3,2018-05-22T19:53:36Z,swp,host.global,used_percent,8259
,,3,2018-05-22T19:53:46Z,swp,host.global,used_percent,8259
,,3,2018-05-22T19:53:56Z,swp,host.global,used_percent,8259
,,3,2018-05-22T19:54:06Z,swp,host.global,used_percent,8259
,,3,2018-05-22T19:54:16Z,swp,host.global,used_percent,8264
"

csv.from(csv: input) |> to(bucket: "telegraf")

And then query it like this (with the rules PushDownReadTagValuesRule and PushDownReadTagKeysRule enabled)

package main

import "csv"

from(bucket: "telegraf")
    |> range(start: 2017-01-01T00:00:00Z)
    |> filter(fn: (r) => r._measurement == "sys")
    |> filter(fn: (r) => r.host == "host.local")
    |> keep(columns: ["_field"])
    |> group()
    |> distinct(column: "_field")
    |> sort()

If this bug is present then 4 rows will be returned, even though there should only be 2.

@jsternberg jsternberg self-assigned this Sep 21, 2020
@jsternberg
Copy link
Contributor

I find that commenting out only PushDownReadTagValuesRule causes the tests to succeed so I'm not sure if there's an issue with PushDownReadTagKeysRule. Still investigating.

@wolffcm
Copy link
Author

wolffcm commented Sep 22, 2020

Yes, there may not be an issue with PushDownReadTagKeysRule, as it doesn't cause a test to fail. I'm still suspect that there is a bug there too, since it does something pretty similar, and our testing in this area is not so good. But I'm not all that clear on exactly how it works, so I'm not certain.

(This issue was present in the post-split, pre-Flux-upgrade, but it was not found at that time due to the failing test being added to Flux after algo-w)

Rather than just re-enable the rule because no tests fail with it, since you'll have the context, can you have a look at the code and check it, or at the very least, see if there is missing test coverage that should be added for it?

@jsternberg
Copy link
Contributor

Yea. I'll make sure some appropriate test cases are present for this.

@jsternberg
Copy link
Contributor

@wolffcm just to clarify, you believe this bug has always existed in the 1.x query engine and was only caught because a new test was added and it was fixed in v2 but it was never fixed in v1. Correct?

@wolffcm
Copy link
Author

wolffcm commented Sep 22, 2020

Yes, that's my understanding. The tests that started failing were added in this PR:
influxdata/flux#2805
Note that these tests are not regression tests, they are testing pure Flux functions.

So I don't think this bug was ever present in v2. But it seems to have existed in v1 as long as we've had those rules. I guess it will need to be backported to 1.x too.

@wolffcm
Copy link
Author

wolffcm commented Oct 22, 2020

Having looked at this in detail there seems to be a fundamental difference in how we understand _field and field keys in tsm2 vs tsm1. Both tagValues and tagKeys will need to be updated so that we avoid wrong answers. I've filed issues for this work:
#19794
#19806

I have also opened a PR to add tests that demonstrate the wrong answers:
influxdata/flux#3272
When that PR is closed, I will close this issue, but the two above issues will remain and need to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants