Unable to copy layered worldstate on 22.10.2 - Snapshots=False #4784

non-fungible-nelson · 2022-12-07T18:08:25Z

Description

Besu user experiencing the following bug:

022-12-07 08:38:20.491-07:00 | vert.x-worker-thread-0 | INFO | MainnetBlockValidator | Optional[Unable to copy Layered Worldstate for 0x3c2a249675dca59a4477c9a3ffdcc51e203617266f1a0f6434fa1a1dc5fec3f0]. Block 16130345 (0xa8b549ea57d83df60aec71c6fb40ab8391b0f640d7a2d2a013fce3a296800ad0), caused by org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x3c2a249675dca59a4477c9a3ffdcc51e203617266f1a0f6434fa1a1dc5fec3f0...

Directly after a restart and their node is stuck. Restart does not fix. Initially the user experienced the error below, then upon restart, the error above.

2022-12-07 08:09:44.904-07:00 | vert.x-worker-thread-0 | WARN | EngineNewPayload | Invalid new payload: number: 16130345, hash: 0xa8b549ea57d83df60aec71c6fb40ab8391b0f640d7a2d2a013fce3a296800ad0, parentHash: 0x3c2a249675dca59a4477c9a3ffdcc51e203617266f1a0f6434fa1a1dc5fec3f0, latestValidHash: 0x3c2a249675dca59a4477c9a3ffdcc51e203617266f1a0f6434fa1a1dc5fec3f0, status: INVALID, validationError: Unable to process block because parent world state 0x3b34cb5ab01b1b590a73ac002acbf60790360952ed1679bd6d9f9b608856b29a is not available

User does not have snapshots enabled. User is on 22.10.2.

Discord context

Acceptance Criteria

Bonsai is resilient to this concurrency bug

Steps to Reproduce (Bug)

Run Besu
Halt or restart
Experience error (specific DB state)

Expected behavior: [What you expect to happen]
Continues normally.

Actual behavior: [What actually happens]
Bonsai resumes as normal

Frequency: [What percentage of the time does it occur?]
Infrequent

Versions (Add all that apply)

Software version: 22.10.2

Additional Information (Add any of the following or anything else that may be relevant)

Besu setup info - snapshots false
System info - memory, CPU

yorickdowne · 2022-12-20T13:42:40Z

Yep ran into this. Restart doesn't fix it; no bonsai snapshots. Running bonsai tries with snap sync.

eth-main-execution-1  | 2022-12-20 13:41:14.414+00:00 | vert.x-worker-thread-0 | INFO  | MainnetBlockValidator | Optional[Unable to copy Layered Worldstate for 0x1551f6d040298ec23c52f0fcf6bab2ee80cff9b958e1bdc27c11ba98b612b1d4]. Block 16224004 (0xe6f16e7d0103a45734a8d744de0b1c57c143fac4e750ebd9afd6e0ee708e94fc), caused by org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x1551f6d040298ec23c52f0fcf6bab2ee80cff9b958e1bdc27c11ba98b612b1d4
eth-main-execution-1  | 2022-12-20 13:41:19.451+00:00 | vert.x-worker-thread-0 | INFO  | MainnetBlockValidator | Optional[Unable to copy Layered Worldstate for 0x1551f6d040298ec23c52f0fcf6bab2ee80cff9b958e1bdc27c11ba98b612b1d4]. Block 16224004 (0xe6f16e7d0103a45734a8d744de0b1c57c143fac4e750ebd9afd6e0ee708e94fc), caused by org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x1551f6d040298ec23c52f0fcf6bab2ee80cff9b958e1bdc27c11ba98b612b1d4

Frequency: Reasonably frequent. Saw it in 2 out of 4 fresh syncs so far.

bryson-m · 2022-12-20T16:29:40Z

I ran into this today as well. Same hashes and block number as @yorickdowne

AegeanDad · 2022-12-22T23:35:31Z

I am having this exact issue following a proper shutdown and reboot. Prior to this, I had deleted the Besu DB and completed a fresh sync. Following the sync, everything was normal until the reboot. Running 22.10.3.

Dec 22 15:31:26 GHBeelink2 besu[4272]: 2022-12-22 15:31:26.971-08:00 | vert.x-worker-thread-0 | INFO | MainnetBlockValidator | Optional[Unable to copy Layered Worldstate for 0x3bc82334b7902026f2e9ed64122ccf224899f014dc25823a8b30568e588f20cd]. Block 16243159 (0xd25d6b550954f0b0358297170861224b6136e17d54180fb3ddfa0b04c7964f06), caused by org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x3bc82334b7902026f2e9ed64122ccf224899f014dc25823a8b30568e588f20cd

0xcd2c6 · 2022-12-22T23:41:26Z

Experiencing the same issue after having to transfer validators to a different machine earlier this week because of an unrelated issue. Running 22.10.3 combined with Teku. Have deleted db and trying to resync currently.

garyschulte · 2022-12-27T14:59:16Z

This issue is mitigated in the 23.1.0-beta release coming on the 28th. The underlying data problem that is the cause of this issue persists, but 23.1.0 should be able to recover from these kinds of errors via backward sync of subsequent blocks.

timjrobinson · 2023-01-01T02:28:01Z

I started getting this issue and tried the 23.1.0 release candidate docker image (23.1.0-RC1-SNAPSHOT-openjdk-latest) to mitigate it and am getting these errors instead:

rocketpool_eth1  | 2023-01-01 02:26:03.390+00:00 | vert.x-eventloop-thread-17 | ERROR | ExecutionEngineJsonRpcMethod | failed to exec consensus method engine_newPayloadV1
rocketpool_eth1  | java.lang.RuntimeException: java.lang.RuntimeException: org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x334a634a192f87ae76c23c1b8a1f40619382d00825877d07ebef1947d777f637
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:175)
rocketpool_eth1  | 	at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.validateBlock(MergeCoordinator.java:441)
rocketpool_eth1  | 	at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.rememberBlock(MergeCoordinator.java:455)
rocketpool_eth1  | 	at org.hyperledger.besu.consensus.merge.blockcreation.TransitionCoordinator.rememberBlock(TransitionCoordinator.java:142)
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.api.jsonrpc.internal.methods.engine.EngineNewPayload.syncResponse(EngineNewPayload.java:219)
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.api.jsonrpc.internal.methods.ExecutionEngineJsonRpcMethod.lambda$response$0(ExecutionEngineJsonRpcMethod.java:73)
rocketpool_eth1  | 	at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
rocketpool_eth1  | 	at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
rocketpool_eth1  | 	at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
rocketpool_eth1  | 	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
rocketpool_eth1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
rocketpool_eth1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
rocketpool_eth1  | 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
rocketpool_eth1  | 	at java.base/java.lang.Thread.run(Thread.java:833)
rocketpool_eth1  | Caused by: java.lang.RuntimeException: org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x334a634a192f87ae76c23c1b8a1f40619382d00825877d07ebef1947d777f637
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.bonsai.BonsaiLayeredWorldState.copy(BonsaiLayeredWorldState.java:275)
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.MainnetBlockValidator.lambda$validateAndProcessBlock$0(MainnetBlockValidator.java:120)
rocketpool_eth1  | 	at java.base/java.util.Optional.map(Optional.java:260)
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:117)
rocketpool_eth1  | 	... 13 more
rocketpool_eth1  | Caused by: org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x334a634a192f87ae76c23c1b8a1f40619382d00825877d07ebef1947d777f637
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.bonsai.BonsaiLayeredWorldState.lambda$copy$4(BonsaiLayeredWorldState.java:272)
rocketpool_eth1  | 	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
rocketpool_eth1  | 	at org.hyperledger.besu.ethereum.bonsai.BonsaiLayeredWorldState.copy(BonsaiLayeredWorldState.java:269)
rocketpool_eth1  | 	... 16 more

non-fungible-nelson · 2023-01-03T16:01:55Z

@timjrobinson - can you provide us your config?
@garyschulte - see above and let's discuss tomorrow.

garyschulte · 2023-01-03T16:20:09Z

@timjrobinson, 23.1.0 mitigates the issue by preventing database corruption when this error occurs. If you have run into it on 22.10.x, the database is in an inconsistent state. I am working on a standalone cli tool that can repair the state, if you still have a besu database in an inconsistent state (i.e. you have not tried to resync)

timjrobinson · 2023-01-04T07:44:47Z

@non-fungible-nelson the config is all rocketpool defaults but with snap sync enabled.

I deleted my besu data to switch back to geth so unfortunately I don't have that database any more.

j4cko · 2023-01-04T08:04:15Z

I ran into the same issue a few weeks back and admittedly switched to geth at the time. But I still have the corrupted database. If there is something I can try, let me know (could also share).

felix-halim · 2023-01-12T19:29:31Z

I also ran into the same issue today:

2023-01-12 11:22:18.852-08:00 | vert.x-worker-thread-0 | INFO | MainnetBlockValidator | Optional[Unable to copy Layered Worldstate for 0x8a8d1fbd1f40bea01fd7a4ec7e7c50661f3e5f30baf16de5db99fc660c53f882]. Block 16387938 (0x8c0993a6ee49b3e8f4726400117d377c1c358037b28701c4955ad58a303f2dcd), caused by org.hyperledger.besu.plugin.services.exception.StorageException: Unable to copy Layered Worldstate for 0x8a8d1fbd1f40bea01fd7a4ec7e7c50661f3e5f30baf16de5db99fc660c53f882

I am using 22.10.3 with bonsai with checkpoint.

Is there a way to fix this using command line yet?
Or should I delete everything and sync from the start?
This will happen again, right?

Thanks!

non-fungible-nelson · 2023-01-12T19:37:51Z

Hi folks - this bug is a "symptom" of three underlying issues we have patched:
#4786 Bugfix snapshot transaction segfaults after storage truncation
#4862 Bugfix potential chain head and worldstate inconsistency
#4906 Bugfix for selfdestruct and bonsai during heal step.

We will be testing these over the weekend for a release next week. This upcoming release will also have a check at boot to check for these types of errors and correct them or resync the worldstate (much shorter than a full resync). But if your node is experiencing this on 22.10.3, a resync is the best way for now to address. The trigger for the majority of these bugs is very rare (fixed in #4906), and a resync should fix it. If you happen to be unlucky and complete your snap sync with the narrow conditions for this bug, it wont manifest immediately, and should be rectified when we release 22.10.4. Thanks for the patience.

AegeanDad · 2023-01-12T20:46:10Z

I'm not sure about the experience of others who have seen this issue but I thought I would share my specific case. Right around Christmas, Gary from the Besu team and I spent a few hours looking at this on my setup here. This issue only started happening after a proper clean shutdown and restart for me. This was the first shutdown I had had after a fresh sync of Besu. Prior to shut-down, I was fully synced and logs were clean. When it came back up, it got stuck on a particular block with these errors. Gary and I kicked off a fresh sync that night and it is back to normal. Point being, it did not occur during regular runtime; only after a restart and that was with a fresh database. Maybe that's why it is hard to reproduce this issue. Now that I did yet another resync, I am afraid to take Besu down for maintenance until we have the patch.

…

On Thu, Jan 12, 2023, 11:38 AM Matt Nelson ***@***.***> wrote: Hi folks - this bug is a "symptom" of three underlying issues we have patched: #4786 <#4786> Bugfix snapshot transaction segfaults after storage truncation #4862 <#4862> Bugfix potential chain head and worldstate inconsistency #4906 <#4906> Bugfix for selfdestruct and bonsai during heal step. We will be testing these over the weekend for a release next week. This upcoming release will also have a check at boot to check for these types of errors and correct them or resync the worldstate (much shorter than a full resync). But if your node is experiencing this on 22.10.3, a resync is the best way for now to address. The trigger for the majority of these bugs is very rare (fixed in #4906 <#4906>), and a resync should fix it. If you happen to be unlucky and complete your snap sync with the narrow conditions for this bug, it wont manifest immediately, and should be rectified when we release 22.10.4. Thanks for the patience. — Reply to this email directly, view it on GitHub <#4784 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYPVR2POB5VYI6X6ZJ7EMYDWSBMRVANCNFSM6AAAAAASXEGAIQ> . You are receiving this because you commented.Message ID: ***@***.***>

non-fungible-nelson · 2023-01-12T20:47:54Z

Definitely leave your node running. The problem actually can begin during the sync phase (i.e. one of the bugs we patched was in the "heal" step of sync, it improperly filled the database, leading to these worldstate errors). If your node is running fine for now, definitely let it run while we get the release ready for next week.

b-m-f · 2023-01-29T01:11:55Z

This happened to me twice in 1 week.

The first time it was triggered by a full SSD. Second time around there was enough storage and it occured 2 days after a successful resync.

Bonsai with snapshots.

Edit:

I hace updated to the 23.1-SNAPSHOT image from Dockerhub and started a resync. Hope there is no problem with this.

non-fungible-nelson · 2023-01-29T15:59:01Z

@b-m-f make sure to disable snapshots if they are enabled. At this time, they are exacerbating these issues. We are tracking this here #4768. These bugs all manifest a bit differently, but are the same.

non-fungible-nelson · 2023-03-07T16:52:31Z

tracking in multiple other locations. closing here

non-fungible-nelson mentioned this issue Dec 7, 2022

Catch-All Critical Transaction Exception / Unable to load trie node #4785

Closed

non-fungible-nelson added bug Something isn't working mainnet labels Dec 12, 2022

fab-10 mentioned this issue Dec 13, 2022

Besu is not finishing syncing. #4698

Closed

garyschulte mentioned this issue Dec 23, 2022

Bugfix potential chain head and worldstate inconsistency #4862

Merged

2 tasks

garyschulte mentioned this issue Jan 11, 2023

Address bonsai self-destruct stale code present #4906

Merged

2 tasks

garyschulte closed this as completed in #4906 Jan 13, 2023

non-fungible-nelson reopened this Jan 29, 2023

garyschulte mentioned this issue Feb 3, 2023

Move worldstate on forkchoice reorg #5041

Closed

2 tasks

systemfreund mentioned this issue Feb 7, 2023

Critical Exception Processing Transaction #4768

Closed

non-fungible-nelson closed this as completed Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to copy layered worldstate on 22.10.2 - Snapshots=False #4784

Unable to copy layered worldstate on 22.10.2 - Snapshots=False #4784

non-fungible-nelson commented Dec 7, 2022 •

edited

Loading

yorickdowne commented Dec 20, 2022 •

edited

Loading

bryson-m commented Dec 20, 2022

AegeanDad commented Dec 22, 2022 •

edited

Loading

0xcd2c6 commented Dec 22, 2022

garyschulte commented Dec 27, 2022

timjrobinson commented Jan 1, 2023 •

edited

Loading

non-fungible-nelson commented Jan 3, 2023

garyschulte commented Jan 3, 2023

timjrobinson commented Jan 4, 2023

j4cko commented Jan 4, 2023

felix-halim commented Jan 12, 2023

non-fungible-nelson commented Jan 12, 2023

AegeanDad commented Jan 12, 2023 via email

non-fungible-nelson commented Jan 12, 2023

b-m-f commented Jan 29, 2023 •

edited

Loading

non-fungible-nelson commented Jan 29, 2023 •

edited

Loading

non-fungible-nelson commented Mar 7, 2023

Unable to copy layered worldstate on 22.10.2 - Snapshots=False #4784

Unable to copy layered worldstate on 22.10.2 - Snapshots=False #4784

Comments

non-fungible-nelson commented Dec 7, 2022 • edited Loading

Description

Acceptance Criteria

Steps to Reproduce (Bug)

Versions (Add all that apply)

Additional Information (Add any of the following or anything else that may be relevant)

yorickdowne commented Dec 20, 2022 • edited Loading

bryson-m commented Dec 20, 2022

AegeanDad commented Dec 22, 2022 • edited Loading

0xcd2c6 commented Dec 22, 2022

garyschulte commented Dec 27, 2022

timjrobinson commented Jan 1, 2023 • edited Loading

non-fungible-nelson commented Jan 3, 2023

garyschulte commented Jan 3, 2023

timjrobinson commented Jan 4, 2023

j4cko commented Jan 4, 2023

felix-halim commented Jan 12, 2023

non-fungible-nelson commented Jan 12, 2023

AegeanDad commented Jan 12, 2023 via email

non-fungible-nelson commented Jan 12, 2023

b-m-f commented Jan 29, 2023 • edited Loading

non-fungible-nelson commented Jan 29, 2023 • edited Loading

non-fungible-nelson commented Mar 7, 2023

non-fungible-nelson commented Dec 7, 2022 •

edited

Loading

yorickdowne commented Dec 20, 2022 •

edited

Loading

AegeanDad commented Dec 22, 2022 •

edited

Loading

timjrobinson commented Jan 1, 2023 •

edited

Loading

b-m-f commented Jan 29, 2023 •

edited

Loading

non-fungible-nelson commented Jan 29, 2023 •

edited

Loading