fix: field metaqueries take fast path if predicate is only on _measurement
#21962
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #21961
This PR updates the logic for detecting if a query that is attempting to get values for
_field
contains a predicate on something other than_measurement
.Since the
influxql
expression will have had references to_measurement
or the bytes equivalent\x00
replaced with_name
byreads.NodeToExpr
:influxdb/storage/reads/influxql_predicate.go
Lines 137 to 143 in 5d84c60
...we need to check for that equivalent remapped value when determining if an expression node contains something other than that.
This will allow a query like the one in #21961 to avoid a performance-intensive block scan. The performance increase can be dramatic when querying a large number of series, and this kind of query is very common when exploring data via the UI.
A relevant existing test is https://github.com/influxdata/flux/blob/master/stdlib/influxdata/influxdb/schema/show_fields_with_pred_test.flux, which ensures that the correct result is produced when a query for fields does include a non-measurement predicate. I've also verified locally that a query for fields with a predicate of only
_measurement
produces the correct result, and adding that test case would probably be a good idea as well.Loading a database with ~1 million series and running the query listed in #21961 results in the following from query_benchmarker_influxdb when running locally on my machine, an average query time of ~10ms over 100 queries executed:
Running the same benchmark against the build from the commit just prior to this one results in a query time of ~4 seconds (note: the benchmark time was limited to 30 seconds, so only 9 queries ran):