Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default ESQL data partitioning to DOC #99545

Closed
wants to merge 7 commits into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Sep 13, 2023

The DOC data partitioning typically outperforms SEGMENT data partitioning. Since we've capped the number of concurrent operators to the number of threads in the ESQL worker as a safeguard, we should consider changing the default data partitioning from SEGMENT to DOC.

Relates #99189

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@dnhatn dnhatn added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 13, 2023
@dnhatn
Copy link
Member Author

dnhatn commented Sep 13, 2023

Thanks, Nik!

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz
Copy link
Contributor

jpountz commented Sep 13, 2023

I haven't run benchmarks to confirm, but in theory this change makes latency potentially better but throughput worse until we add support for doc partitioning to Lucene: apache/lucene#9721. This is because some queries like range queries have no option but to evaluate the filter across the entire segment anyway. If you have a segment that you want to partition in, say, 10 partitions and evaluate a range filter across 10 threads (one per partition), each thread will actually evaluate the filter against the entire segment - not what we want.

Given how range queries on the @timestamp field are common, my preference would be to wait until we address this Lucene issue before we enable doc partitioning by default in ES|QL.

@dnhatn dnhatn removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 13, 2023
@dnhatn
Copy link
Member Author

dnhatn commented Sep 13, 2023

@jpountz Thank you for the feedback. I will take a look at the issue that you linked and leave this PR unmerged as you suggested.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mattc58 mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023
@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@dnhatn dnhatn closed this Jan 2, 2024
@dnhatn dnhatn removed >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v8.13.0 labels Jan 2, 2024
@dnhatn dnhatn deleted the default-doc-partitioning branch January 2, 2024 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants