-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Field #1251
Comments
|
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251
fulmicoton
added a commit
that referenced
this issue
Feb 24, 2022
- Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251
Hi, @fulmicoton Range Query is not yet supported for the JSON field. It is only Term Query, right? Do you have any plans to support it? |
All queries should support JSON eventually. PRs welcome |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
JSON field
This feature is motivated by the need for dynamic schema in quickwit.
A json field, has a field value of
serde_json::Value
.Given a json object we emit token under the form
json path
typecode uses the following:
s
: texti
: i64u
: u64f
: f64d
: DateThe
json path
suggests the use of two kinds of separator.Here we use codepoint 0 as a segment separator.
This choice has the benefit that to align the lexicographical order
with the depth first search order.
Note we end with a trailing
\0
separator for the same reason.We also need a separator to separate the field path from the value.
To do so we use the Record Separator codepoint (30)
For instance the following JSON
Generates the following
FieldValue
.Important note!!!
It would have been tempting to put the type before the path.
sbody<\u0000><\u001e>tax
However, on the search side there is a benefit to being able to scan all types associated with a path in a single read.
Ambiguity: Write Side
Number
Number are ambiguous in JSON.
We try to interpret them as
u64
,i64
,f64
in that order.This method unfortunately ambiguous.
For instance, one document could happen to have a positive value for a given field, having it mapped as an integer.
A second document could then have a negative value for the very same field.
This is a pitfall we will have to live with.
On the search side, we use the same logic to map the user value to the number type and value.
Date
Similarly to numbers, we cannot really know if a string is a date or not.
The presence of the date time implies that we do date detection.
Whatever values matches a datetime pattern will be interpreted as a date.
Ambiguity: Read side
We apply the same logic at query side.
Unfortunately, combined with tokenization, ambiguity can hit here.
Conflate
Optionally, a user can flag the field as conflated. In that case in addition to the indexing described above, we also index all token values
at the root.
Query parsing
Explicit matching
The JSON field itself has a name.
The user can query the fields explicitly as follows
json.body:hello
To target an inner struct object, the user can use the "." to extend the query.
json.attr.color:blue
Default field
If the json field is defined as a default search field, then it behaves a tad differently from other default fields.
attr.color:blue
is now a match.blue
is also a matchSide effect
.
becomes forbidden in a field name.\u001E and \u0 becomes forbidden in a field value.
Two schema on the same field.
We need to record both ints and text terms in the same posting list.
They may have different recording option (we don't want position for ints) which is not supported by tantivy at the moment.
On the index writer,
We can have two posting writer, one for text and one for other stuff.
The notion of posting writer is intern to segment writer, so we can have a little bit complexity. here.
On the segment serialization & read side, we just avoid recording and serializing positions for ints.
The text was updated successfully, but these errors were encountered: