Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix snapshot transaction segfaults after storage truncation #4786

Merged

Conversation

garyschulte
Copy link
Contributor

@garyschulte garyschulte commented Dec 7, 2022

PR description

This PR addresses a race where snap and checkpoint syncs can cause a segfault when starting a worldstate download. The segfault occurs when rocksdb snapshot transactions are read/written after a storage truncation. To prevent this and to ensure future uses of clear() and clearFlatDatabase() cannot cause segfaults, this PR:

  • adds BonsaiStorageSubscriber type to handle storage events
  • removes addCachedLayer from TrieLogManager, moves to AbstractTrieLogManager
  • TrieLogManager takes a snapshot immediately in addCachedLayer rather than deferring it
  • notifies subscribers on events that can affect the worldstate, specifically:
    • clear (truncate)
    • clearFlatDatabase (truncating a subset of storage)
    • close (closing the worldstate storage)

Additionally, only drop cached layers by distance from head when the worldstate archive has more than the configured number of retained layers.

Fixed Issue(s)

addresses #4765

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if
    updates are required.

Changelog

@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from 50ee949 to cf85130 Compare December 7, 2022 21:26
@garyschulte garyschulte changed the title Bugfix startup worldstate race with checkpoint sync Bugfix startup race with checkpoint sync Dec 7, 2022
@garyschulte garyschulte changed the title Bugfix startup race with checkpoint sync Bugfix startup race with checkpoint sync and bonsai snapshots Dec 7, 2022
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from 9fd132a to 8e8a72f Compare December 8, 2022 00:41
@garyschulte garyschulte marked this pull request as ready for review December 8, 2022 00:41
@garyschulte garyschulte marked this pull request as draft December 8, 2022 02:05
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch 2 times, most recently from 61169d0 to 8e8a72f Compare December 8, 2022 17:10
@garyschulte garyschulte marked this pull request as ready for review December 9, 2022 00:46
@garyschulte garyschulte requested a review from matkt December 9, 2022 00:47
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from 376705b to 8558206 Compare December 9, 2022 06:41
@garyschulte garyschulte requested a review from ahamlat December 9, 2022 06:43
@garyschulte garyschulte changed the title Bugfix startup race with checkpoint sync and bonsai snapshots Bugfix snapshot transaction segfaults after storage truncation Dec 9, 2022
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from d2e0f82 to f6cd5f8 Compare December 9, 2022 07:58
@matkt
Copy link
Contributor

matkt commented Dec 9, 2022

we must check the impact on performance at the time of persistence because we are creating the snapshot at this time

@ahamlat
Copy link
Contributor

ahamlat commented Dec 12, 2022

we must check the impact on performance at the time of persistence because we are creating the snapshot at this time

I didn't see any performance impact on block processing time, after 3 hours of execution

@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from f6cd5f8 to 5a2d7d1 Compare December 13, 2022 00:17
Copy link
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve performance overhead. This PR does not have an effect on block processing performance

@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch 3 times, most recently from 4564c26 to 5d3a53a Compare December 15, 2022 06:50
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch 5 times, most recently from 9f48fba to 8890a82 Compare December 21, 2022 19:28
…napshot

remove MutableWorldState parameter from saveTrieLog
make addCachedLayer protected in AbstractTrieLogManager, remove from TrieLogManager interface

Signed-off-by: garyschulte <garyschulte@gmail.com>
… KeyValue storage

Signed-off-by: garyschulte <garyschulte@gmail.com>
… trie logs

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
…ke clear and clearFlatDatabase and close as appropriate

protect RocksDBSnapshotTransactions to prevent reading or writing to a closed transaction

Signed-off-by: garyschulte <garyschulte@gmail.com>
…R the worldstate is committed.

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
…rsist()

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
…BlockCreator

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
   adds subscriptions to BonsaiPersistedWorldState
   use worldstate accessor in AbstractTrieLogManager rather than member persisted state
   cached snapshots subscribe to their respective worldstates
   prevent closing of snapshots unless subscriber count is 0

Signed-off-by: garyschulte <garyschulte@gmail.com>
…er than in the snapshot itself, so we can more easily handle truncate events

Signed-off-by: garyschulte <garyschulte@gmail.com>
…ssertions

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
@garyschulte garyschulte force-pushed the bugfix/snapshot-archive-eviction branch from 8890a82 to f781b32 Compare December 22, 2022 20:10
@garyschulte garyschulte dismissed matkt’s stale review December 22, 2022 20:10

resolved issues

@garyschulte garyschulte enabled auto-merge (squash) December 22, 2022 20:14
@garyschulte garyschulte merged commit 2b17e04 into hyperledger:main Dec 22, 2022
@garyschulte garyschulte deleted the bugfix/snapshot-archive-eviction branch December 22, 2022 22:22
macfarla pushed a commit to macfarla/besu that referenced this pull request Jan 10, 2023
…ledger#4786)

* subscribe snapshot worldstates to parent worldstate storage events like clear and clearFlatDatabase and close as appropriate to avoid segfaults
* fix for direct snapshot creation when using snapshot archive
* ensure we only prune bonsai worldstates if we have more than the configured number of retained states

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
garyschulte added a commit to garyschulte/besu that referenced this pull request Jan 14, 2023
…ledger#4786)

* subscribe snapshot worldstates to parent worldstate storage events like clear and clearFlatDatabase and close as appropriate to avoid segfaults
* fix for direct snapshot creation when using snapshot archive
* ensure we only prune bonsai worldstates if we have more than the configured number of retained states

Signed-off-by: garyschulte <garyschulte@gmail.com>
garyschulte added a commit to garyschulte/besu that referenced this pull request Jan 14, 2023
…ledger#4786)

* subscribe snapshot worldstates to parent worldstate storage events like clear and clearFlatDatabase and close as appropriate to avoid segfaults
* fix for direct snapshot creation when using snapshot archive
* ensure we only prune bonsai worldstates if we have more than the configured number of retained states

Signed-off-by: garyschulte <garyschulte@gmail.com>
garyschulte added a commit to garyschulte/besu that referenced this pull request Jan 14, 2023
…ledger#4786)

* subscribe snapshot worldstates to parent worldstate storage events like clear and clearFlatDatabase and close as appropriate to avoid segfaults
* fix for direct snapshot creation when using snapshot archive
* ensure we only prune bonsai worldstates if we have more than the configured number of retained states

Signed-off-by: garyschulte <garyschulte@gmail.com>
garyschulte added a commit that referenced this pull request Jan 14, 2023
* Use safe block as pivot block suring snapsync (#4819)
* Bugfix snapshot transaction segfaults after storage truncation (#4786)
* Bugfix potential chain head and worldstate inconsistency (#4862)
* Peering - disconnect worst peer (#4888)
* Attempt to fix CPU spikes issue (#4867)
* not block subscribe when the worldstate storage is open (#4912)

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
…ledger#4786)

* subscribe snapshot worldstates to parent worldstate storage events like clear and clearFlatDatabase and close as appropriate to avoid segfaults
* fix for direct snapshot creation when using snapshot archive
* ensure we only prune bonsai worldstates if we have more than the configured number of retained states

Signed-off-by: garyschulte <garyschulte@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants