Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing columns or multiple columns in aggregation #1913

Closed
fulmicoton opened this issue Feb 27, 2023 · 1 comment
Closed

Handle missing columns or multiple columns in aggregation #1913

fulmicoton opened this issue Feb 27, 2023 · 1 comment
Assignees

Comments

@fulmicoton
Copy link
Collaborator

fulmicoton commented Feb 27, 2023

After PR#1912 has landed, tantivy will be able to run aggregation over JSON fields.

This means we need to handle the case where columns are missing.
The result shoudl be the same as if the column was not missing but "empty."

We will also need to face the possibility that we have more than one column with the same name (but different types).

As a first step we need to define a proper behavior.. The behavior is probably aggregation specific.
For instance:

  • metric: pick the single numerical column (there can only be one) and ignore others.
  • term aggregation: ideal implem would to the aggregation over all of the column and merge the results.
@fulmicoton fulmicoton changed the title Handle missing columns in aggregation Handle missing columns or multiple columns in aggregation Feb 27, 2023
PSeitz added a commit that referenced this issue Mar 28, 2023
- Improve support for mixed types in JSON field aggregations (pick the right field, #1913)
- Resolve the issue with JSON serialization for numeric keys (fixes #1967)
- Add JSON round-trip test for term buckets
- Remove `u64_lenient`, as this is a footgun without the type
- move aggregation benchmarks
PSeitz added a commit that referenced this issue Mar 28, 2023
- Improve support for mixed types in JSON field aggregations (pick the right field, #1913)
- Resolve the issue with JSON serialization for numeric keys (fixes #1967)
- Add JSON round-trip test for term buckets
- Remove `u64_lenient`, as this is a footgun without the type
- move aggregation benchmarks
PSeitz added a commit that referenced this issue Mar 31, 2023
* Better mixed types support in aggs and fix serialization issue

- Improve support for mixed types in JSON field aggregations (pick the right field, #1913)
- Resolve the issue with JSON serialization for numeric keys (fixes #1967)
- Add JSON round-trip test for term buckets
- Remove `u64_lenient`, as this is a footgun without the type
- move aggregation benchmarks

* remove shadowing
@PSeitz
Copy link
Contributor

PSeitz commented Jun 26, 2023

As a first step we need to define a proper behavior.. The behavior is probably aggregation specific. For instance:

* metric: pick the single numerical column (there can only be one) and ignore others.

* term aggregation: ideal implem would to the aggregation over all of the column and merge the results.

For the term agg and mixed types we probably would need a separate virtual column over all column indices (ColumnIndex) to collect the missing values. So we would aggregate over up to 3 columns (term + number + null) in a term aggregation.

PSeitz added a commit that referenced this issue Aug 25, 2023
add missing parameter for stats,min,max,count,sum,avg
closes #1913
partially #1789
PSeitz added a commit that referenced this issue Aug 25, 2023
add missing parameter for stats,min,max,count,sum,avg
closes #1913
partially #1789
@PSeitz PSeitz closed this as completed in 73cb717 Aug 28, 2023
ethever pushed a commit to ethever/tantivy that referenced this issue Aug 30, 2023
)

* add missing parameter for stats,min,max,count,sum,avg

add missing parameter for stats,min,max,count,sum,avg
closes quickwit-oss#1913
partially quickwit-oss#1789

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants