Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(node): Wait for checkpoint service to stop during reconfig #5391

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

jkrvivian
Copy link
Contributor

Description of change

Merge the fix from upstream: MystenLabs/sui@8d2bb84

Currently during reconfig, CheckpointService tasks, including CheckpointBuilder and CheckpointAggregator, are notified to shut down. But reconfig does not wait for them to finish shutting down. There can be a race between the reconfig loop proceeding to drop the epoch db handle, while CheckpointBuilder tries to read from epoch db when creating a new checkpoint. The race can result in panics.

Links to any relevant issues

Close #4691

Type of change

  • Bug fix

Change checklist

  • I have followed the contribution guidelines for this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have checked that new and existing unit tests pass locally with my changes

@jkrvivian jkrvivian added the node Issues related to the Core Node team label Feb 14, 2025
@jkrvivian jkrvivian self-assigned this Feb 14, 2025
@jkrvivian jkrvivian requested review from a team as code owners February 14, 2025 07:06
Copy link

vercel bot commented Feb 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

4 Skipped Deployments
Name Status Preview Comments Updated (UTC)
apps-backend ⬜️ Ignored (Inspect) Feb 14, 2025 7:06am
apps-ui-kit ⬜️ Ignored (Inspect) Feb 14, 2025 7:06am
rebased-explorer ⬜️ Ignored (Inspect) Feb 14, 2025 7:06am
wallet-dashboard ⬜️ Ignored (Inspect) Feb 14, 2025 7:06am

@jkrvivian jkrvivian merged commit 0943984 into develop Feb 18, 2025
41 checks passed
@jkrvivian jkrvivian deleted the node/await-chk-service-on-reconfig branch February 18, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-protocol node Issues related to the Core Node team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Checkpoint] wait for checkpoint service to stop during reconfig (#17… · MystenLabs/sui@8d2bb84
5 participants