Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO-NOT-MERGE] .ef optimization experiments #12907

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

wmitsuda
Copy link
Member

No description provided.

@wmitsuda wmitsuda added the do-not-merge PR that is in a merge-able state but is waiting for something else to take place before merging label Nov 28, 2024
@wmitsuda
Copy link
Member Author

note: code quality is "hacky", pls do not consider for review yet, only for evaluation of the overall solution + validating the strategy

@wmitsuda
Copy link
Member Author

now that the prototype 1 is out, let me rebase it to main as it is 20 days outdated...

@wmitsuda wmitsuda force-pushed the wmitsuda/ef-optimization-experiments branch 5 times, most recently from 3ea3702 to 9e5031a Compare December 7, 2024 00:33
@wmitsuda wmitsuda force-pushed the wmitsuda/ef-optimization-experiments branch 3 times, most recently from 9e7c1b2 to ae6c0bf Compare December 10, 2024 00:26
AskAlexSharov pushed a commit that referenced this pull request Dec 10, 2024
As part of #12907 I'll have
other iterator-like sequence implementations.

It makes sense to generalize the ErrEliasFanoIterExhausted and move it
to a common package and reuse it, rather than making a bunch of
IteratorExhausted-like for each implementation.
@wmitsuda wmitsuda force-pushed the wmitsuda/ef-optimization-experiments branch from ae6c0bf to 6d7f604 Compare December 10, 2024 03:52
@wmitsuda
Copy link
Member Author

Current status:

  • Implemented collate/merge support in addition to read from snapshots. In theory it is feature complete, now need to do a full validation of correctness + optimizations, polish the code, etc...
  • Hid everything behind a feature flag: --experimental.ef-optimization
    • That means if someone runs this branch as-is against an existing node, it should behave like this PR is not applied and the data should be written/read like regular E3.
    • If the feature flag is activated, new data moved from mdbx to snapshots will be written in the new optimized format (simple sequences/rebased elias-fano)
    • New merged files will be converted during the merge process.
    • Existing snapshots will not be touched, read support is backwards compatible.
    • New data is NOT backwards compatible, can't disable the flag and go back to regular E3. DO BACKUP YOUR NODE BEFORE TRYING THIS PR.
  • Did some simple tests so far on holesky:
    • Ran this PR with the flag DISABLED, waited for 1 collation, compared with regular node in order to validate feature flag disabled == working as before.
    • Ran this PR with the flag ENABLED, waited for 1 collation, 1 step was moved to new format.

Next steps:

  • Do longer tests (to validate multiple merges) + data deep comparison with regular E3.
  • Bootstrapping a new holesky node with --no-downloader in other to simulate a snapshotter + full comparison with regular E3 node. Not sure if my hardware will do it in a sane timeframe, but I'll try it.

@AskAlexSharov
Copy link
Collaborator

@wmitsuda FYI: maybe next commands may help to test files erigon snapshots integrity --check=InvertedIndex and erigon snapshots integrity --check=HistoryNoSystemTxs

@wmitsuda wmitsuda force-pushed the wmitsuda/ef-optimization-experiments branch 2 times, most recently from 928dc9d to b0c3db9 Compare December 15, 2024 04:59
@wmitsuda
Copy link
Member Author

wmitsuda commented Dec 16, 2024

Current status:

  • Regenerated snapshots on holesky, there is now a idx_verify tool for comparing two node's .ef file integrity:
time build/bin/integration idx_verify --sourcedir /var/lib/ethereum/erigon3-holesky/erig
on --targetdir /var/lib/ethereum/erigon3-holesky-experimental/erigon
2024/12/16 04:53:56 Comparing idx files:
2024/12/16 04:53:56 Deep checking file v1-accounts.0-64.ef...
2024/12/16 04:59:56 Deep checking file v1-accounts.64-68.ef...
2024/12/16 05:00:05 Deep checking file v1-accounts.68-70.ef...
2024/12/16 05:00:10 Deep checking file v1-accounts.70-71.ef...
2024/12/16 05:00:13 Deep checking file v1-code.0-64.ef...
2024/12/16 05:01:47 Deep checking file v1-code.64-68.ef...
2024/12/16 05:01:48 Deep checking file v1-code.68-70.ef...
2024/12/16 05:01:48 Deep checking file v1-code.70-71.ef...
2024/12/16 05:01:48 Deep checking file v1-logaddrs.0-64.ef...
2024/12/16 05:02:03 Deep checking file v1-logaddrs.64-68.ef...
2024/12/16 05:02:06 Deep checking file v1-logaddrs.68-70.ef...
2024/12/16 05:02:07 Deep checking file v1-logaddrs.70-71.ef...
2024/12/16 05:02:07 Deep checking file v1-logtopics.0-64.ef...
2024/12/16 05:13:25 Deep checking file v1-logtopics.64-68.ef...
2024/12/16 05:13:55 Deep checking file v1-logtopics.68-70.ef...
2024/12/16 05:14:03 Deep checking file v1-logtopics.70-71.ef...
2024/12/16 05:14:07 Deep checking file v1-receipt.0-64.ef...
2024/12/16 05:14:07 Optimized eliasfano is longer
2024/12/16 05:14:07 key=0x00
2024/12/16 05:14:07 source min=0 max=99999999 count=67244080
2024/12/16 05:14:07 target min=0 max=99999999 count=97250285
2024/12/16 05:14:07 value mismatch!
2024/12/16 05:14:07 skipping to next file...
2024/12/16 05:14:07 Deep checking file v1-receipt.64-68.ef...
2024/12/16 05:14:08 Deep checking file v1-receipt.68-70.ef...
2024/12/16 05:14:08 Deep checking file v1-receipt.70-71.ef...
2024/12/16 05:14:08 values mismatch: source=109375000 target=109375019
2024/12/16 05:14:08 key=0x01
2024/12/16 05:14:08 source min=109375000 max=110937468 count=97974
2024/12/16 05:14:08 target min=109375019 max=110937468 count=97974
2024/12/16 05:14:08 value mismatch!
2024/12/16 05:14:08 skipping to next file...
2024/12/16 05:14:08 Deep checking file v1-storage.0-64.ef...
2024/12/16 05:34:50 Deep checking file v1-storage.64-68.ef...
2024/12/16 05:38:57 Deep checking file v1-storage.68-70.ef...
2024/12/16 05:41:05 Deep checking file v1-storage.70-71.ef...
2024/12/16 05:41:44 Deep checking file v1-tracesfrom.0-64.ef...
2024/12/16 05:45:46 Deep checking file v1-tracesfrom.64-68.ef...
2024/12/16 05:45:54 Deep checking file v1-tracesfrom.68-70.ef...
2024/12/16 05:45:57 Deep checking file v1-tracesfrom.70-71.ef...
2024/12/16 05:45:59 Deep checking file v1-tracesto.0-64.ef...
2024/12/16 05:50:10 Deep checking file v1-tracesto.64-68.ef...
2024/12/16 05:50:19 Deep checking file v1-tracesto.68-70.ef...
2024/12/16 05:50:23 Deep checking file v1-tracesto.70-71.ef...
build/bin/integration idx_verify --sourcedir  --targetdir   1242.30s user 302.65s system 45% cpu 56:28.69 total
  • Except for v1-receipt-*.ef, deep comparison between a node synced from current snapshots vs genesis sync + this PR optimizations showed data is equivalent.
  • Receipts seems buggy according to our conversation in discord, so let's consider this test was successful as everything else worked as expected.
  • The /idx size was reduced by 29GB as forecasted by the initial calculations:
du -hd1 /var/lib/ethereum/erigon3-holesky/erigon/snapshots
11G     /var/lib/ethereum/erigon3-holesky/erigon/snapshots/history
4.0K    /var/lib/ethereum/erigon3-holesky/erigon/snapshots/caplin
7.1G    /var/lib/ethereum/erigon3-holesky/erigon/snapshots/accessor
36G     /var/lib/ethereum/erigon3-holesky/erigon/snapshots/domain
51G     /var/lib/ethereum/erigon3-holesky/erigon/snapshots/idx
164G    /var/lib/ethereum/erigon3-holesky/erigon/snapshots

du -hd1 /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots
11G     /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots/history
4.0K    /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots/caplin
7.2G    /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots/accessor
36G     /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots/domain
22G     /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots/idx
136G    /var/lib/ethereum/erigon3-holesky-experimental/erigon/snapshots

so, we can conclude data is correct + size reduction is the expected one so far.

@wmitsuda
Copy link
Member Author

from now on, I think we are ready to advance one more step towards making it stable:

  • We could use some preliminary review to fix obvious design decisions (@AskAlexSharov when you have some time), but not merge yet.
  • Do a genesis sync on a real chain, i.e. mainnet and observe the results. Let it run for a longer period.

@wmitsuda wmitsuda force-pushed the wmitsuda/ef-optimization-experiments branch from f5dfa8c to b6db9b9 Compare December 16, 2024 04:46
}

// simple encoding
if data[0]&0b11110000 == 0b10000000 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to use the existing SimpleEncoding const... will also add one for the comparison mask

// from raw data
func Count(baseNum uint64, data []byte) uint64 {
// plain elias fano (legacy)
if data[0]&0b10000000 == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats Simple Encoding constant value, probably zero bit could be just compared to 0b0 const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because if the 8th bit is 0, meaning it is legacy elias fano encoding type, I'm using that entire byte as part of the following elias fano struct, so in theory the remaining 7 bits of data[0] will be used to decode elias fano.

anyway, will add more consts to clarify that comparison.

}

// simple encoding
if raw[0]&0b11110000 == 0b10000000 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats that mask 0xf0? using const is better, also seems can use just 2 bits instead of 8, have something to encode up to 2^6? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I'm reserving the range [0b10000000, 0b100001111] to SimpleEncoding. I'm using the it to encode at the same time the type (SimpleEncoding) + the number of elements (up to 16) in the same byte and save space.

So, the masking + comparison is saying: ignore the less significant 4 bits, if it is 0b1000????, then it is SimpleEncoding.

yeah, will add consts, forgot it :)

I didn't understand the part about 2^6?

it.Seek(uint64(v))
}
return it
} else if s.currentEnc == PlainEliasFano || s.currentEnc == RebasedEliasFano {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw since constants are growing could just do s.currentEnc < threshold. So maybe reorder them as PlainEF, RebasedEF, and then SimpleEnc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, problem here is that the consts are at the same time the type itself and the bit fragments that are used when encoding them, i.e., they are not 1, 2 and 3, but 0, 128 and 144.

I think I could do it, but I think the code now is clearer to read and understand?

Also, if we are going to merge it now, maybe when can even remove the plain elias-fano choice since it would not be used anymore in this part of code and have one less comparison.

@wmitsuda
Copy link
Member Author

Regarding comparing converted holesky vs. generated holesky (only snapshots dir), it seems now everything matches, except:

  • v1-receipt.* -> probably due to concerns I mentioned in the previous comment
  • v1-commitment.* -> will ignore, known as not deterministic
  • domains/v1-*.0-64.bt -> strange, all domains, only the 0-64 range .bt doesn't match
  • *.kvei -> not sure it is not deterministic, this run I made sure to copy salts before the execution, maybe there is something else that is making it not deterministic.
  • There are a few transactions.idx (only the latest ones) which are not being deterministically regenerated. No idea why, salt files were copied, and all < 3100- files were regenerated correctly. The change in this PR has nothing to do with transactions .idx, apparently safe to ignore, but not sure why the diff...
diff -rq ~/eth-nodes/erigon3-holesky-converted/snapshots ~/eth-nodes/erigon3-holesky-experimental/snapshots |
grep -v torrent | grep -v \.efi | grep -v \.ef
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/accessor/v1-receipt.0-64.vi and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/accessor/v1-receipt.0-64.vi differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/accessor/v1-receipt.72-74.vi and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/accessor/v1-receipt.72-74.vi differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-accounts.0-64.bt and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-accounts.0-64.bt differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-accounts.0-64.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-accounts.0-64.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-accounts.64-72.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-accounts.64-72.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-accounts.72-74.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-accounts.72-74.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-code.0-64.bt and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-code.0-64.bt differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-code.0-64.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-code.0-64.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-code.64-72.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-code.64-72.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-code.72-74.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-code.72-74.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.0-64.kv and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.0-64.kv differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.0-64.kvi and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.0-64.kvi differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.64-72.kv and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.64-72.kv differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.64-72.kvi and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.64-72.kvi differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.72-74.kv and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.72-74.kv differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-commitment.72-74.kvi and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-commitment.72-74.kvi differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-receipt.0-64.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-receipt.0-64.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-receipt.64-72.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-receipt.64-72.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-receipt.72-74.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-receipt.72-74.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-storage.0-64.bt and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-storage.0-64.bt differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-storage.0-64.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-storage.0-64.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-storage.64-72.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-storage.64-72.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/domain/v1-storage.72-74.kvei and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/domain/v1-storage.72-74.kvei differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/history/v1-receipt.0-64.v and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/history/v1-receipt.0-64.v differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/history/v1-receipt.72-74.v and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/history/v1-receipt.72-74.v differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003180-003190-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003180-003190-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003180-003190-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003180-003190-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003190-003191-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003190-003191-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003190-003191-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003190-003191-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003191-003192-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003191-003192-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003191-003192-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003191-003192-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003192-003193-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003192-003193-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003192-003193-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003192-003193-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003193-003194-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003193-003194-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003193-003194-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003193-003194-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003194-003195-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003194-003195-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003194-003195-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003194-003195-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003195-003196-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003195-003196-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003195-003196-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003195-003196-transactions.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003196-003197-transactions-to-block.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003196-003197-transactions-t
o-block.idx differ
Files /Users/wmitsuda/eth-nodes/erigon3-holesky-converted/snapshots/v1-003196-003197-transactions.idx and /Users/wmitsuda/eth-nodes/erigon3-holesky-experimental/snapshots/v1-003196-003197-transactions.idx differ

@wmitsuda
Copy link
Member Author

Given the overall results on holesky now seems satisfactory, now triggering a regeneration run on mainnet, should be done in < 1 week

@wmitsuda
Copy link
Member Author

Status: mainnet sync from scratch still running after 1 week, at block 17.9M. note: mainnet sync performance noticeably drops after block 16M, going from +700Mgas/s -> 400-500Mgas/s.

@wmitsuda
Copy link
Member Author

wmitsuda commented Feb 3, 2025

Status: mainnet sync from scratch done after 11 days, now proceeding to check data integrity.

FYI: a few hours after reaching the tip I got: #13676 , but I think it has nothing to do with this change, so ignoring in this context.

@wmitsuda
Copy link
Member Author

wmitsuda commented Feb 3, 2025

data seems good, diffs are similar to what I found on holesky.

I have another node that is not converted, but new data is being generated in the new format, and new .ef files are equivalent to plain E3 ones.

@wmitsuda
Copy link
Member Author

wmitsuda commented Feb 3, 2025

given our previous conversation about leaving this one post-E3 final, I'll keep merging main and running multiple rounds of sync from scratch on mainnet from now on in order to catch any regressions.

@VBulikov VBulikov linked an issue Feb 11, 2025 that may be closed by this pull request
@VBulikov VBulikov removed a link to an issue Feb 11, 2025
@wmitsuda
Copy link
Member Author

Status:

  • last mainnet sync test resulted in .ef data being semantically correct compared to plain E3.
  • however the data generated by this PR also differs in some other few files that were not expected.
  • so the question is: is this PR also generating incorrect data for let's say a few domains? or this PR is correct and for some unknown reason the data coming from published snapshots is different? (it may be incorrect due to some past bug, but also it could be just format slightly changed over time but it is still semantically equivalent, so can't assume snapshots are incorrect).

in order to answer that question I conducted another experiment:

  • regenerated the state snapshots using the same base branch of this PR (beta2)
  • then I ended up with the following 3 equivalent mainnet instances:
    • (1) mainnet plain -> just E3 beta 2 as-is
    • (2) mainnet experimental -> this PR regenerated locally on top of beta 2
    • (3) mainnet regen -> beta 2 regenerated locally

The expected result is:

  • diffing (2) and (3) must be the same except:
    • idx/*.ef -> because of this PR's optimizations
    • accessor/*.efi -> because .ef contents changed, hence offsets changed
    • we are ignoring domains/commitment-* -> because it is known it is not deterministic
    • we are ignoring domains?receipts-* -> because of recent bugs at the time of beta2

if the above is true, then it means there are not errors introduced in other files by this PR and the currently published snapshots are not 100% correspondent to what should be produced by the current code (but it doesn't mean they are incorrect).

the following comments will contain the result of this experiment for future reference.

@wmitsuda
Copy link
Member Author

Results: (2) vs (3)

diff -r -x '*.torrent' -x 'preverified.toml' -x '*.seg' -x '*.idx' erigon3-mainnet-regen/snapshots/ erigon3-mainnet-experimental/snapshots/ > experimental-vs-regen.txt

the attached diff shows the result is what we expected, hence validating our hypothesis this PR is correct.

the only thing not expected was all the .kvei differs, but we'll assume it is not deterministic or it uses some seed/salt/etc we didn't properly configured at chain generation.

experimental-vs-regen.txt

@wmitsuda
Copy link
Member Author

Results: (1) vs (3)

diff -r -x '*.torrent' -x 'preverified.toml' -x '*.seg' -x '*.idx' erigon3-mainnet-regen/snapshots/ erigon3-mainnet/snapshots/ > original-vs-regen.txt

noticed how many data files seems different, like regenerated .ef and .v files, making a direct comparison between published snapshots and regenerated ones unreliable, so I'll rely on (2) vs (3) to validate this PR.

original-vs-regen.txt

@wmitsuda wmitsuda marked this pull request as ready for review February 28, 2025 09:45
@wmitsuda
Copy link
Member Author

I'm taking this PR out of draft state in order to trigger the CI pipeline and because after the latest round of tests on mainnet I'm now confident about it's correctness.

I'm leaving the do-not-merge tags because it is a one-way operation and that should be properly planned with the team.

@wmitsuda
Copy link
Member Author

also, next step is merge recent main as current test was run against beta2.

@wmitsuda
Copy link
Member Author

wmitsuda commented Mar 2, 2025

@AskAlexSharov I think I could use some preliminary review now if you have some time.

We are not going to merge it yet, I've proposed an in-person session to discuss how the rollout will be coordinated for 3.1, but I think the code is good enough to be reviewed now so we don't have to rush last minute modifications later.

@wmitsuda wmitsuda requested a review from AskAlexSharov March 2, 2025 08:09
@AskAlexSharov
Copy link
Collaborator

commitment .kvei and .kvi - is index of .kv - also expected to be different

@AskAlexSharov
Copy link
Collaborator

If your PR changing .ef then .efi will also changed

Copy link
Collaborator

@AskAlexSharov AskAlexSharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great.
left some cosmetic comments.

log.Println("Sumarizing idx files...")
cEF := 0
for _, file := range files {
if file.IsDir() || !strings.HasSuffix(file.Name(), ".ef") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if file.IsDir() || file.Type().IsRegular() ||`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: we have dir.ListFiles helper

logger := debug.SetupCobra(cmd, "integration")

// accessorDir := filepath.Join(datadirCli, "snapshots", "accessor")
idxPath := filepath.Join(datadirCli, "snapshots", "idx")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: we have dirs := datadir.New(datadirCli)
and dirs object will have all paths.
For example: ilepath.Join(datadirCli, "snapshots", "idx") can replace by dirs.SnapIdx


import (
"io/fs"
"log"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use "github.com/erigontech/erigon-lib/log/v3" package for logging.
can be used by log.Info("message", "some_key", "some_value", "another_key", "another_value").
and remove log package use

return uint64(data[0]&SimpleEncodingSizeMask) + 1
}

panic("unknown encoding")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a bit info to this message - to make it easier to debug in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge PR that is in a merge-able state but is waiting for something else to take place before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants