-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_docids_for_value_range
is broken w/certain fast fields that use a GCD inverse
#1757
Comments
confirmed as a bug |
Thanks for the report! Currently inverse mapping is executed Minimal Example#[test]
fn test_gcd_bug_regression_1757() {
let mut schema_builder = Schema::builder();
let num_field = schema_builder.add_u64_field("url_norm_hash", FAST | INDEXED);
let schema = schema_builder.build();
let index = Index::create_in_ram(schema);
{
let mut writer = index.writer_for_tests().unwrap();
writer
.add_document(doc! {
num_field => 100u64,
})
.unwrap();
writer
.add_document(doc! {
num_field => 200u64,
})
.unwrap();
writer.commit().unwrap();
}
let reader = index.reader().unwrap();
let searcher = reader.searcher();
let segment = &searcher.segment_readers()[0];
let field = segment.fast_fields().u64(num_field).unwrap();
let mut vec = vec![];
field.get_docids_for_value_range(150..=150, 0..u32::MAX, &mut vec);
assert_eq!(vec.len(), 0);
} Buggy CodeThis part of the code needs to be able to handle user data fn get_docids_for_value_range(
&self,
range: RangeInclusive<Output>,
doc_id_range: Range<u32>,
positions: &mut Vec<u32>,
) {
self.from_column.get_docids_for_value_range(
self.monotonic_mapping.inverse(range.start().clone())
..=self.monotonic_mapping.inverse(range.end().clone()),
doc_id_range,
positions,
)
} |
* handle user input on get_docid_for_value_range fixes #1757 * pass range as parameter
* handle user input on get_docid_for_value_range fixes #1757 * pass range as parameter
* handle user input on get_docid_for_value_range fixes quickwit-oss#1757 * pass range as parameter
Describe the bug
When inserting fast fields in a certain order, segments will occasionally end up with a column that matches extra documents when calling
get_docids_for_value_range
using nonexistant values.The testcase at the bottom of this report illustrates a failing case, though you need to run it repeatedly to cause it to fail.
Expected behaviour: passing fastfield values/ranges to
get_docids_for_value_range
that cannot possibly exist in a given segment reader should return no matching documents.Which version of tantivy are you using?
0.19.0
To Reproduce
This code will reproduce the bug, but it won't happen every time. Running it repeatedly will eventually give you a
SegmentReader
with the min/max values[6341073085727221 3373930417471920086]
, and thatSegmentReader
will always match document zero when querying in the invalid value7999999999999999
.The text was updated successfully, but these errors were encountered: