Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing fields for terms aggregations #3570

Closed
peacand opened this issue Jun 22, 2023 · 2 comments
Closed

Missing fields for terms aggregations #3570

peacand opened this issue Jun 22, 2023 · 2 comments
Assignees
Labels
elasticsearch-api enhancement New feature or request

Comments

@peacand
Copy link

peacand commented Jun 22, 2023

Hi,

I'm trying to perform some terms aggregations on multiple fields against documents having different formats and fields.
For example I would like to run terms aggregations on fields "user" and then "event_type".
If the documents matching my query don't have a field "user", then the result of the whole aggregation is empty, even if my documents have a field "event_type".

I would like to be able to configure a default value for non existing fields.
For example, if I set the default value for missing fields to "NULL", the result of such aggregation would be :

| user | event_type | docs_count |
| NULL | value1 | 789789
| NULL | value2 | 678678

In case part of the documents has both fields and another part only has the field "event_type", then the result would be :

| user | event_type | docs_count |
| NULL | value1 | 456
| NULL | value2 | 6778
| user1 | value1 | 78675
| user2 | value1 | 45645

Note: All the docs_count values are completely random, there is absolutely no logic in these values used as examples.

This feature seems to be equivalent to the "missing" parameter in the Elasticsearch aggregation API : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_5

@peacand peacand added the enhancement New feature or request label Jun 22, 2023
@PSeitz
Copy link
Contributor

PSeitz commented Jun 22, 2023

Thanks for the bug report.

The missing parameter is unsupported currently, I think we can add this in the next release.

Missing and Mixed Types

One issue we could have is that term aggregation may run on fields with mixed types, e.g. two columns: numbers and text. Each column type has its own column index for existence of values. I think a missing parameter should apply over all.

Aggregations over multiple fields are running independently currently, so this is not so easy to handle.
We could just ignore this corner case for now.

To avoid duplicate missing results, we could apply the missing parameter depending on its type.

Related Issues: quickwit-oss/tantivy#1789 quickwit-oss/tantivy#1913

@PSeitz PSeitz moved this to In Progress in Quickwit 0.7 Aug 22, 2023
@PSeitz PSeitz moved this from In Progress to Done in Quickwit 0.7 Sep 11, 2023
@PSeitz
Copy link
Contributor

PSeitz commented Sep 13, 2023

This is implemented on main branch and will be released with quickwit 0.7

quickwit-oss/tantivy#2149
quickwit-oss/tantivy#2103
quickwit-oss/tantivy#2150
quickwit-oss/tantivy#2151
quickwit-oss/tantivy#2157

@PSeitz PSeitz closed this as completed Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elasticsearch-api enhancement New feature or request
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants