-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a targetSearchConcurrency
parameter to LogMergePolicy
.
#13517
Conversation
This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy` that apache#13430 is adding to `TieredMergePolicy`. The implementation is simpler here since `LogMergePolicy` is constrained to only merging adjacent segments. The downside is that this option results is significantly more write amplification on this merge policy compared with `TieredMergePolicy`, which is allowed to merge non-adjacent segments. From simulating merging on an index that gets appended 555 equal-size segments (like nightly benchmarks) and a `targetSearchConcurrency` of 8, I get the following results: - 19.1 segments in the index on average, vs. 11.1 with a `targetSearchConcurrency` of 1. - Maximum number of segments in the index of 28, vs. 22 with a `targetSearchConcurrency` of 1. - Write amplification through merging of 3.6 instead of 2.9. For comparison, 3.6 is about the write amplification that you get with a merge factor of 7 otherwise. - The resulting index has 24 segments, including 11 segments on the highest tier, vs. 15 segments and 5 segments on the highest tier with a `targetSearchConcurrency` of 1.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you for following up on #13430. Do you want to add a CHANGES entry?
Thanks for looking @stefanvodita! I added a CHANGES entry and improved the logic to not apply a threshold on the doc count for merges below the min merge size. I will merge soon. |
Woops I need to revert this for now. This made sense for TieredMergePolicy but this causes some test failures with |
This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy` that #13430 is adding to `TieredMergePolicy`.
This adds the same
targetSearchConcurrency
parameter toLogMergePolicy
that #13430 is adding toTieredMergePolicy
. The implementation is simpler here sinceLogMergePolicy
is constrained to only merging adjacent segments.From simulating merging on an index that gets appended 555 equal-size segments (like nightly benchmarks) and a
targetSearchConcurrency
of 8, I get the following results:targetSearchConcurrency
of 1.targetSearchConcurrency
of 1.targetSearchConcurrency
of 1.