-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default ESQL data partitioning to DOC #99545
Conversation
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
Thanks, Nik! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I haven't run benchmarks to confirm, but in theory this change makes latency potentially better but throughput worse until we add support for doc partitioning to Lucene: apache/lucene#9721. This is because some queries like range queries have no option but to evaluate the filter across the entire segment anyway. If you have a segment that you want to partition in, say, 10 partitions and evaluate a range filter across 10 threads (one per partition), each thread will actually evaluate the filter against the entire segment - not what we want. Given how range queries on the |
@jpountz Thank you for the feedback. I will take a look at the issue that you linked and leave this PR unmerged as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pinging @elastic/es-analytics-geo (Team:Analytics) |
The DOC data partitioning typically outperforms SEGMENT data partitioning. Since we've capped the number of concurrent operators to the number of threads in the ESQL worker as a safeguard, we should consider changing the default data partitioning from SEGMENT to DOC.
Relates #99189