Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a targetSearchConcurrency parameter to LogMergePolicy. #13517

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 25, 2024

This adds the same targetSearchConcurrency parameter to LogMergePolicy that #13430 is adding to TieredMergePolicy. The implementation is simpler here since LogMergePolicy is constrained to only merging adjacent segments.

From simulating merging on an index that gets appended 555 equal-size segments (like nightly benchmarks) and a targetSearchConcurrency of 8, I get the following results:

  • 19.1 segments in the index on average, vs. 11.1 with a targetSearchConcurrency of 1.
  • Maximum number of segments in the index of 28, vs. 22 with a targetSearchConcurrency of 1.
  • Write amplification through merging of 3.6 instead of 2.9. For comparison, 3.6 is about the write amplification that you get with a merge factor of 7 otherwise.
  • The resulting index has 24 segments, including 11 segments on the highest tier, vs. 15 segments and 5 segments on the highest tier with a targetSearchConcurrency of 1.

This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy`
that apache#13430 is adding to `TieredMergePolicy`. The implementation is simpler
here since `LogMergePolicy` is constrained to only merging adjacent segments.
The downside is that this option results is significantly more write
amplification on this merge policy compared with `TieredMergePolicy`, which is
allowed to merge non-adjacent segments.

From simulating merging on an index that gets appended 555 equal-size segments
(like nightly benchmarks) and a `targetSearchConcurrency` of 8, I get the
following results:
 - 19.1 segments in the index on average, vs. 11.1 with a
   `targetSearchConcurrency` of 1.
 - Maximum number of segments in the index of 28, vs. 22 with a
   `targetSearchConcurrency` of 1.
 - Write amplification through merging of 3.6 instead of 2.9. For comparison,
   3.6 is about the write amplification that you get with a merge factor of 7
   otherwise.
 - The resulting index has 24 segments, including 11 segments on the highest
   tier, vs. 15 segments and 5 segments on the highest tier with a
   `targetSearchConcurrency` of 1.
@jpountz jpountz added this to the 9.12.0 milestone Jun 25, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jul 11, 2024
Copy link
Contributor

@stefanvodita stefanvodita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for following up on #13430. Do you want to add a CHANGES entry?

@github-actions github-actions bot removed the Stale label Jul 12, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Jul 18, 2024

Thanks for looking @stefanvodita! I added a CHANGES entry and improved the logic to not apply a threshold on the doc count for merges below the min merge size. I will merge soon.

@jpountz jpountz merged commit 9f04086 into apache:main Jul 18, 2024
2 of 3 checks passed
@jpountz jpountz deleted the logmp_target_search_concurrency branch July 18, 2024 09:28
@jpountz
Copy link
Contributor Author

jpountz commented Jul 18, 2024

improved the logic to not apply a threshold on the doc count for merges below the min merge size

Woops I need to revert this for now. This made sense for TieredMergePolicy but this causes some test failures with LogMergePolicy in the corner case when minMergeSize maps to a bigger merge than maxMergeDocs.

jpountz added a commit that referenced this pull request Jul 18, 2024
This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy`
that #13430 is adding to `TieredMergePolicy`.
jpountz added a commit that referenced this pull request Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants