Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve sync committee updates #7456

Merged
merged 7 commits into from
Feb 12, 2025

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Feb 12, 2025

Motivation

  • from electra, processSyncCommitteeUpdates() could be >15s according to the devnet

Description

  • the main fix is in computeShuffledIndex where we can cache pivot and source computation there
  • some other optimization other than that:
    • only compute hash once every 16 iterations
    • compute int manually instead of using bytesToInt in order not to use BigInt
    • cache shuffled index

I guess if we use hashtree we can improve more but the diff is a lot already and the main optimization is in computeShuffledIndex(), not the hash function. We can consider that in the future.

We can also improve pre-electra but I think it's been not that bad for a long time, so only focus on electra in this PR

Closes #7366

Tests

  • added unit tests to compare naive version vs the optimized version
  • benchmarks on local show >1000x difference for the main concerned function naiveGetNextSyncCommitteeIndices() while CI only show >20x difference. This is my local
computeProposerIndex
    ✔ naive computeProposerIndex 100000 validators                        31.86491 ops/s    31.38248 ms/op        -         10 runs   34.5 s
    ✔ computeProposerIndex 100000 validators                              106.2267 ops/s    9.413833 ms/op        -         10 runs   10.4 s

  getNextSyncCommitteeIndices electra
    ✔ naiveGetNextSyncCommitteeIndices 1000 validators                   0.2121840 ops/s    4.712890  s/op        -         10 runs   51.7 s
    ✔ getNextSyncCommitteeIndices 1000 validators                         214.9251 ops/s    4.652783 ms/op        -         45 runs  0.714 s
    ✔ naiveGetNextSyncCommitteeIndices 10000 validators                  0.2122278 ops/s    4.711918  s/op        -         10 runs   51.8 s
    ✔ getNextSyncCommitteeIndices 10000 validators                        220.2337 ops/s    4.540632 ms/op        -         46 runs  0.710 s
    ✔ naiveGetNextSyncCommitteeIndices 100000 validators                 0.2117828 ops/s    4.721820  s/op        -         10 runs   52.2 s
    ✔ getNextSyncCommitteeIndices 100000 validators                       204.7383 ops/s    4.884283 ms/op        -         43 runs  0.714 s

  computeShuffledIndex
    ✔ naive computeShuffledIndex 100000 validators                      0.06638498 ops/s    15.06365  s/op        -          3 runs   60.3 s
    ✔ cached computeShuffledIndex 100000 validators                       1.932706 ops/s    517.4092 ms/op        -         10 runs   5.72 s

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 94.67456% with 9 lines in your changes missing coverage. Please review.

Project coverage is 50.44%. Comparing base (2247c16) to head (8713832).
Report is 3 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #7456      +/-   ##
============================================
+ Coverage     50.25%   50.44%   +0.19%     
============================================
  Files           602      602              
  Lines         40401    40583     +182     
  Branches       2204     2229      +25     
============================================
+ Hits          20305    20474     +169     
- Misses        20056    20069      +13     
  Partials         40       40              

Copy link
Contributor

github-actions bot commented Feb 12, 2025

Performance Report

🚀🚀 Significant benchmark improvement detected

Benchmark suite Current: 5ce0389 Previous: e45e0eb Ratio
forkChoice updateHead vc 600000 bc 64 eq 300000 17.732 ms/op 56.830 ms/op 0.31
Full benchmark results
Benchmark suite Current: 5ce0389 Previous: e45e0eb Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 1.2612 ms/op 967.80 us/op 1.30
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 41.295 us/op 35.981 us/op 1.15
BLS verify - blst 985.25 us/op 817.77 us/op 1.20
BLS verifyMultipleSignatures 3 - blst 1.3825 ms/op 1.2220 ms/op 1.13
BLS verifyMultipleSignatures 8 - blst 2.0204 ms/op 1.7056 ms/op 1.18
BLS verifyMultipleSignatures 32 - blst 6.0422 ms/op 4.9317 ms/op 1.23
BLS verifyMultipleSignatures 64 - blst 11.843 ms/op 9.1516 ms/op 1.29
BLS verifyMultipleSignatures 128 - blst 18.820 ms/op 18.359 ms/op 1.03
BLS deserializing 10000 signatures 788.90 ms/op 706.46 ms/op 1.12
BLS deserializing 100000 signatures 7.5122 s/op 7.0843 s/op 1.06
BLS verifyMultipleSignatures - same message - 3 - blst 1.1192 ms/op 1.1020 ms/op 1.02
BLS verifyMultipleSignatures - same message - 8 - blst 1.1281 ms/op 1.2250 ms/op 0.92
BLS verifyMultipleSignatures - same message - 32 - blst 1.8808 ms/op 2.0611 ms/op 0.91
BLS verifyMultipleSignatures - same message - 64 - blst 2.8436 ms/op 2.9255 ms/op 0.97
BLS verifyMultipleSignatures - same message - 128 - blst 5.4133 ms/op 4.8755 ms/op 1.11
BLS aggregatePubkeys 32 - blst 21.486 us/op 20.260 us/op 1.06
BLS aggregatePubkeys 128 - blst 88.746 us/op 73.051 us/op 1.21
notSeenSlots=1 numMissedVotes=1 numBadVotes=10 55.174 ms/op 69.307 ms/op 0.80
notSeenSlots=1 numMissedVotes=0 numBadVotes=4 54.259 ms/op 58.467 ms/op 0.93
notSeenSlots=2 numMissedVotes=1 numBadVotes=10 43.652 ms/op 43.737 ms/op 1.00
getSlashingsAndExits - default max 78.136 us/op 79.359 us/op 0.98
getSlashingsAndExits - 2k 349.31 us/op 436.31 us/op 0.80
proposeBlockBody type=full, size=empty 6.1992 ms/op 7.5658 ms/op 0.82
isKnown best case - 1 super set check 223.00 ns/op 205.00 ns/op 1.09
isKnown normal case - 2 super set checks 212.00 ns/op 202.00 ns/op 1.05
isKnown worse case - 16 super set checks 208.00 ns/op 195.00 ns/op 1.07
InMemoryCheckpointStateCache - add get delete 2.5580 us/op 2.5090 us/op 1.02
validate api signedAggregateAndProof - struct 1.4940 ms/op 1.5124 ms/op 0.99
validate gossip signedAggregateAndProof - struct 1.5562 ms/op 1.9862 ms/op 0.78
batch validate gossip attestation - vc 640000 - chunk 32 141.13 us/op 161.40 us/op 0.87
batch validate gossip attestation - vc 640000 - chunk 64 128.25 us/op 130.27 us/op 0.98
batch validate gossip attestation - vc 640000 - chunk 128 124.12 us/op 129.12 us/op 0.96
batch validate gossip attestation - vc 640000 - chunk 256 122.51 us/op 137.43 us/op 0.89
pickEth1Vote - no votes 1.1353 ms/op 2.0641 ms/op 0.55
pickEth1Vote - max votes 8.5509 ms/op 11.642 ms/op 0.73
pickEth1Vote - Eth1Data hashTreeRoot value x2048 17.268 ms/op 20.411 ms/op 0.85
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 24.436 ms/op 31.101 ms/op 0.79
pickEth1Vote - Eth1Data fastSerialize value x2048 484.96 us/op 548.83 us/op 0.88
pickEth1Vote - Eth1Data fastSerialize tree x2048 2.4735 ms/op 5.0010 ms/op 0.49
bytes32 toHexString 384.00 ns/op 406.00 ns/op 0.95
bytes32 Buffer.toString(hex) 283.00 ns/op 243.00 ns/op 1.16
bytes32 Buffer.toString(hex) from Uint8Array 409.00 ns/op 447.00 ns/op 0.91
bytes32 Buffer.toString(hex) + 0x 252.00 ns/op 425.00 ns/op 0.59
Object access 1 prop 0.12200 ns/op 0.16200 ns/op 0.75
Map access 1 prop 0.12400 ns/op 0.19200 ns/op 0.65
Object get x1000 6.2420 ns/op 8.2330 ns/op 0.76
Map get x1000 6.7810 ns/op 6.7410 ns/op 1.01
Object set x1000 29.527 ns/op 54.004 ns/op 0.55
Map set x1000 20.647 ns/op 24.415 ns/op 0.85
Return object 10000 times 0.30210 ns/op 0.32000 ns/op 0.94
Throw Error 10000 times 4.5244 us/op 6.3540 us/op 0.71
toHex 150.45 ns/op 148.09 ns/op 1.02
Buffer.from 130.30 ns/op 179.05 ns/op 0.73
shared Buffer 87.409 ns/op 81.836 ns/op 1.07
fastMsgIdFn sha256 / 200 bytes 2.3540 us/op 2.2940 us/op 1.03
fastMsgIdFn h32 xxhash / 200 bytes 242.00 ns/op 210.00 ns/op 1.15
fastMsgIdFn h64 xxhash / 200 bytes 308.00 ns/op 282.00 ns/op 1.09
fastMsgIdFn sha256 / 1000 bytes 7.7080 us/op 7.6310 us/op 1.01
fastMsgIdFn h32 xxhash / 1000 bytes 345.00 ns/op 346.00 ns/op 1.00
fastMsgIdFn h64 xxhash / 1000 bytes 354.00 ns/op 363.00 ns/op 0.98
fastMsgIdFn sha256 / 10000 bytes 68.202 us/op 68.758 us/op 0.99
fastMsgIdFn h32 xxhash / 10000 bytes 1.9280 us/op 1.9000 us/op 1.01
fastMsgIdFn h64 xxhash / 10000 bytes 1.3060 us/op 1.2580 us/op 1.04
send data - 1000 256B messages 13.670 ms/op 18.147 ms/op 0.75
send data - 1000 512B messages 18.776 ms/op 23.192 ms/op 0.81
send data - 1000 1024B messages 25.909 ms/op 30.094 ms/op 0.86
send data - 1000 1200B messages 26.848 ms/op 26.176 ms/op 1.03
send data - 1000 2048B messages 29.220 ms/op 25.956 ms/op 1.13
send data - 1000 4096B messages 30.785 ms/op 29.951 ms/op 1.03
send data - 1000 16384B messages 64.273 ms/op 78.352 ms/op 0.82
send data - 1000 65536B messages 227.77 ms/op 231.64 ms/op 0.98
enrSubnets - fastDeserialize 64 bits 1.0520 us/op 948.00 ns/op 1.11
enrSubnets - ssz BitVector 64 bits 366.00 ns/op 327.00 ns/op 1.12
enrSubnets - fastDeserialize 4 bits 164.00 ns/op 135.00 ns/op 1.21
enrSubnets - ssz BitVector 4 bits 451.00 ns/op 329.00 ns/op 1.37
prioritizePeers score -10:0 att 32-0.1 sync 2-0 163.17 us/op 122.08 us/op 1.34
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 190.48 us/op 149.63 us/op 1.27
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 272.24 us/op 212.68 us/op 1.28
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 521.79 us/op 395.31 us/op 1.32
prioritizePeers score 0:0 att 64-1 sync 4-1 640.64 us/op 478.74 us/op 1.34
array of 16000 items push then shift 1.8231 us/op 1.6753 us/op 1.09
LinkedList of 16000 items push then shift 8.7280 ns/op 8.3880 ns/op 1.04
array of 16000 items push then pop 99.394 ns/op 83.222 ns/op 1.19
LinkedList of 16000 items push then pop 8.1980 ns/op 7.7230 ns/op 1.06
array of 24000 items push then shift 2.7253 us/op 2.4817 us/op 1.10
LinkedList of 24000 items push then shift 8.1550 ns/op 9.0650 ns/op 0.90
array of 24000 items push then pop 111.99 ns/op 129.97 ns/op 0.86
LinkedList of 24000 items push then pop 7.3920 ns/op 8.3300 ns/op 0.89
intersect bitArray bitLen 8 6.9620 ns/op 6.4060 ns/op 1.09
intersect array and set length 8 41.382 ns/op 48.086 ns/op 0.86
intersect bitArray bitLen 128 31.810 ns/op 30.320 ns/op 1.05
intersect array and set length 128 681.48 ns/op 635.75 ns/op 1.07
bitArray.getTrueBitIndexes() bitLen 128 1.2960 us/op 1.0900 us/op 1.19
bitArray.getTrueBitIndexes() bitLen 248 2.5890 us/op 2.2140 us/op 1.17
bitArray.getTrueBitIndexes() bitLen 512 4.9720 us/op 3.9620 us/op 1.25
Buffer.concat 32 items 748.00 ns/op 780.00 ns/op 0.96
Uint8Array.set 32 items 1.7720 us/op 1.9450 us/op 0.91
Buffer.copy 3.3500 us/op 2.5020 us/op 1.34
Uint8Array.set - with subarray 3.1830 us/op 1.6710 us/op 1.90
Uint8Array.set - without subarray 1.9640 us/op 1.2530 us/op 1.57
getUint32 - dataview 207.00 ns/op 194.00 ns/op 1.07
getUint32 - manual 130.00 ns/op 155.00 ns/op 0.84
Set add up to 64 items then delete first 2.8547 us/op 3.1324 us/op 0.91
OrderedSet add up to 64 items then delete first 3.9448 us/op 4.2082 us/op 0.94
Set add up to 64 items then delete last 3.2016 us/op 3.7277 us/op 0.86
OrderedSet add up to 64 items then delete last 5.3586 us/op 4.8845 us/op 1.10
Set add up to 64 items then delete middle 3.2149 us/op 3.2099 us/op 1.00
OrderedSet add up to 64 items then delete middle 7.6479 us/op 6.0673 us/op 1.26
Set add up to 128 items then delete first 6.4990 us/op 6.1757 us/op 1.05
OrderedSet add up to 128 items then delete first 9.7111 us/op 12.986 us/op 0.75
Set add up to 128 items then delete last 6.8958 us/op 6.8848 us/op 1.00
OrderedSet add up to 128 items then delete last 9.1439 us/op 8.2299 us/op 1.11
Set add up to 128 items then delete middle 6.4963 us/op 6.5891 us/op 0.99
OrderedSet add up to 128 items then delete middle 18.109 us/op 18.690 us/op 0.97
Set add up to 256 items then delete first 13.934 us/op 13.208 us/op 1.05
OrderedSet add up to 256 items then delete first 19.983 us/op 23.646 us/op 0.85
Set add up to 256 items then delete last 12.754 us/op 13.629 us/op 0.94
OrderedSet add up to 256 items then delete last 18.459 us/op 28.226 us/op 0.65
Set add up to 256 items then delete middle 13.751 us/op 15.621 us/op 0.88
OrderedSet add up to 256 items then delete middle 52.515 us/op 48.645 us/op 1.08
transfer serialized Status (84 B) 2.6000 us/op 2.3130 us/op 1.12
copy serialized Status (84 B) 1.7290 us/op 1.1980 us/op 1.44
transfer serialized SignedVoluntaryExit (112 B) 2.6180 us/op 3.0560 us/op 0.86
copy serialized SignedVoluntaryExit (112 B) 1.5060 us/op 2.2670 us/op 0.66
transfer serialized ProposerSlashing (416 B) 3.9270 us/op 4.0420 us/op 0.97
copy serialized ProposerSlashing (416 B) 2.8910 us/op 2.0060 us/op 1.44
transfer serialized Attestation (485 B) 3.2320 us/op 3.1420 us/op 1.03
copy serialized Attestation (485 B) 1.4470 us/op 1.3210 us/op 1.10
transfer serialized AttesterSlashing (33232 B) 2.6930 us/op 3.2760 us/op 0.82
copy serialized AttesterSlashing (33232 B) 3.7250 us/op 5.3970 us/op 0.69
transfer serialized Small SignedBeaconBlock (128000 B) 3.4030 us/op 5.0780 us/op 0.67
copy serialized Small SignedBeaconBlock (128000 B) 12.715 us/op 17.829 us/op 0.71
transfer serialized Avg SignedBeaconBlock (200000 B) 3.7040 us/op 6.0880 us/op 0.61
copy serialized Avg SignedBeaconBlock (200000 B) 14.821 us/op 24.069 us/op 0.62
transfer serialized BlobsSidecar (524380 B) 5.1720 us/op 8.2070 us/op 0.63
copy serialized BlobsSidecar (524380 B) 65.804 us/op 96.949 us/op 0.68
transfer serialized Big SignedBeaconBlock (1000000 B) 4.1550 us/op 7.7920 us/op 0.53
copy serialized Big SignedBeaconBlock (1000000 B) 175.63 us/op 362.84 us/op 0.48
pass gossip attestations to forkchoice per slot 3.3543 ms/op 5.0025 ms/op 0.67
forkChoice updateHead vc 100000 bc 64 eq 0 493.21 us/op 713.50 us/op 0.69
forkChoice updateHead vc 600000 bc 64 eq 0 2.9853 ms/op 6.3461 ms/op 0.47
forkChoice updateHead vc 1000000 bc 64 eq 0 5.2416 ms/op 10.787 ms/op 0.49
forkChoice updateHead vc 600000 bc 320 eq 0 3.2569 ms/op 6.0719 ms/op 0.54
forkChoice updateHead vc 600000 bc 1200 eq 0 3.4893 ms/op 5.6614 ms/op 0.62
forkChoice updateHead vc 600000 bc 7200 eq 0 3.9425 ms/op 6.0747 ms/op 0.65
forkChoice updateHead vc 600000 bc 64 eq 1000 12.113 ms/op 11.817 ms/op 1.02
forkChoice updateHead vc 600000 bc 64 eq 10000 12.375 ms/op 12.531 ms/op 0.99
forkChoice updateHead vc 600000 bc 64 eq 300000 17.732 ms/op 56.830 ms/op 0.31
computeDeltas 500000 validators 300 proto nodes 4.3263 ms/op 6.2421 ms/op 0.69
computeDeltas 500000 validators 1200 proto nodes 4.3824 ms/op 6.1386 ms/op 0.71
computeDeltas 500000 validators 7200 proto nodes 4.2442 ms/op 6.3498 ms/op 0.67
computeDeltas 750000 validators 300 proto nodes 6.4660 ms/op 8.4366 ms/op 0.77
computeDeltas 750000 validators 1200 proto nodes 6.6405 ms/op 7.8741 ms/op 0.84
computeDeltas 750000 validators 7200 proto nodes 6.8088 ms/op 6.7574 ms/op 1.01
computeDeltas 1400000 validators 300 proto nodes 12.611 ms/op 12.393 ms/op 1.02
computeDeltas 1400000 validators 1200 proto nodes 12.666 ms/op 17.566 ms/op 0.72
computeDeltas 1400000 validators 7200 proto nodes 12.680 ms/op 12.606 ms/op 1.01
computeDeltas 2100000 validators 300 proto nodes 19.141 ms/op 21.572 ms/op 0.89
computeDeltas 2100000 validators 1200 proto nodes 20.412 ms/op 19.246 ms/op 1.06
computeDeltas 2100000 validators 7200 proto nodes 17.953 ms/op 21.455 ms/op 0.84
altair processAttestation - 250000 vs - 7PWei normalcase 2.2323 ms/op 4.2598 ms/op 0.52
altair processAttestation - 250000 vs - 7PWei worstcase 3.1412 ms/op 4.8564 ms/op 0.65
altair processAttestation - setStatus - 1/6 committees join 130.19 us/op 141.35 us/op 0.92
altair processAttestation - setStatus - 1/3 committees join 253.57 us/op 296.83 us/op 0.85
altair processAttestation - setStatus - 1/2 committees join 358.53 us/op 398.72 us/op 0.90
altair processAttestation - setStatus - 2/3 committees join 467.47 us/op 807.52 us/op 0.58
altair processAttestation - setStatus - 4/5 committees join 632.01 us/op 644.38 us/op 0.98
altair processAttestation - setStatus - 100% committees join 792.66 us/op 1.2432 ms/op 0.64
altair processBlock - 250000 vs - 7PWei normalcase 5.2667 ms/op 8.5889 ms/op 0.61
altair processBlock - 250000 vs - 7PWei normalcase hashState 37.039 ms/op 48.711 ms/op 0.76
altair processBlock - 250000 vs - 7PWei worstcase 44.423 ms/op 58.829 ms/op 0.76
altair processBlock - 250000 vs - 7PWei worstcase hashState 89.856 ms/op 120.64 ms/op 0.74
phase0 processBlock - 250000 vs - 7PWei normalcase 2.1187 ms/op 3.8707 ms/op 0.55
phase0 processBlock - 250000 vs - 7PWei worstcase 25.910 ms/op 31.420 ms/op 0.82
altair processEth1Data - 250000 vs - 7PWei normalcase 378.09 us/op 428.43 us/op 0.88
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 5.1140 us/op 9.3930 us/op 0.54
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 31.246 us/op 47.874 us/op 0.65
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 8.4400 us/op 16.374 us/op 0.52
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 5.4240 us/op 8.7880 us/op 0.62
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 120.25 us/op 178.53 us/op 0.67
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.2658 ms/op 1.0913 ms/op 1.16
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.7424 ms/op 1.4398 ms/op 1.21
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.7457 ms/op 1.4339 ms/op 1.22
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 3.7548 ms/op 3.9726 ms/op 0.95
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 1.7566 ms/op 1.7316 ms/op 1.01
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 3.9238 ms/op 4.5066 ms/op 0.87
Tree 40 250000 create 452.92 ms/op 969.40 ms/op 0.47
Tree 40 250000 get(125000) 160.37 ns/op 149.24 ns/op 1.07
Tree 40 250000 set(125000) 1.5744 us/op 2.8848 us/op 0.55
Tree 40 250000 toArray() 20.236 ms/op 28.480 ms/op 0.71
Tree 40 250000 iterate all - toArray() + loop 23.147 ms/op 26.099 ms/op 0.89
Tree 40 250000 iterate all - get(i) 58.598 ms/op 66.952 ms/op 0.88
Array 250000 create 3.4392 ms/op 4.7325 ms/op 0.73
Array 250000 clone - spread 1.5172 ms/op 4.4526 ms/op 0.34
Array 250000 get(125000) 0.43100 ns/op 0.41100 ns/op 1.05
Array 250000 set(125000) 0.51000 ns/op 0.43500 ns/op 1.17
Array 250000 iterate all - loop 86.622 us/op 106.08 us/op 0.82
phase0 afterProcessEpoch - 250000 vs - 7PWei 54.222 ms/op 53.980 ms/op 1.00
Array.fill - length 1000000 4.0021 ms/op 9.4050 ms/op 0.43
Array push - length 1000000 15.992 ms/op 33.422 ms/op 0.48
Array.get 0.28103 ns/op 0.48273 ns/op 0.58
Uint8Array.get 0.46926 ns/op 0.47553 ns/op 0.99
phase0 beforeProcessEpoch - 250000 vs - 7PWei 16.911 ms/op 33.637 ms/op 0.50
altair processEpoch - mainnet_e81889 315.91 ms/op 363.36 ms/op 0.87
mainnet_e81889 - altair beforeProcessEpoch 20.537 ms/op 25.462 ms/op 0.81
mainnet_e81889 - altair processJustificationAndFinalization 6.6590 us/op 6.2650 us/op 1.06
mainnet_e81889 - altair processInactivityUpdates 4.8435 ms/op 6.1582 ms/op 0.79
mainnet_e81889 - altair processRewardsAndPenalties 47.919 ms/op 50.373 ms/op 0.95
mainnet_e81889 - altair processRegistryUpdates 827.00 ns/op 826.00 ns/op 1.00
mainnet_e81889 - altair processSlashings 215.00 ns/op 189.00 ns/op 1.14
mainnet_e81889 - altair processEth1DataReset 205.00 ns/op 188.00 ns/op 1.09
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.3525 ms/op 1.3876 ms/op 0.97
mainnet_e81889 - altair processSlashingsReset 1.1460 us/op 1.1020 us/op 1.04
mainnet_e81889 - altair processRandaoMixesReset 1.5340 us/op 1.1520 us/op 1.33
mainnet_e81889 - altair processHistoricalRootsUpdate 205.00 ns/op 225.00 ns/op 0.91
mainnet_e81889 - altair processParticipationFlagUpdates 623.00 ns/op 607.00 ns/op 1.03
mainnet_e81889 - altair processSyncCommitteeUpdates 160.00 ns/op 149.00 ns/op 1.07
mainnet_e81889 - altair afterProcessEpoch 59.019 ms/op 56.001 ms/op 1.05
capella processEpoch - mainnet_e217614 1.0255 s/op 1.0455 s/op 0.98
mainnet_e217614 - capella beforeProcessEpoch 70.024 ms/op 108.01 ms/op 0.65
mainnet_e217614 - capella processJustificationAndFinalization 8.3370 us/op 7.0750 us/op 1.18
mainnet_e217614 - capella processInactivityUpdates 16.286 ms/op 21.268 ms/op 0.77
mainnet_e217614 - capella processRewardsAndPenalties 207.05 ms/op 202.11 ms/op 1.02
mainnet_e217614 - capella processRegistryUpdates 7.0150 us/op 7.0870 us/op 0.99
mainnet_e217614 - capella processSlashings 186.00 ns/op 188.00 ns/op 0.99
mainnet_e217614 - capella processEth1DataReset 181.00 ns/op 186.00 ns/op 0.97
mainnet_e217614 - capella processEffectiveBalanceUpdates 10.555 ms/op 15.901 ms/op 0.66
mainnet_e217614 - capella processSlashingsReset 1.1680 us/op 938.00 ns/op 1.25
mainnet_e217614 - capella processRandaoMixesReset 1.2170 us/op 1.2030 us/op 1.01
mainnet_e217614 - capella processHistoricalRootsUpdate 182.00 ns/op 190.00 ns/op 0.96
mainnet_e217614 - capella processParticipationFlagUpdates 534.00 ns/op 566.00 ns/op 0.94
mainnet_e217614 - capella afterProcessEpoch 128.11 ms/op 129.71 ms/op 0.99
phase0 processEpoch - mainnet_e58758 300.76 ms/op 306.69 ms/op 0.98
mainnet_e58758 - phase0 beforeProcessEpoch 87.645 ms/op 93.576 ms/op 0.94
mainnet_e58758 - phase0 processJustificationAndFinalization 8.6260 us/op 5.7170 us/op 1.51
mainnet_e58758 - phase0 processRewardsAndPenalties 39.237 ms/op 38.769 ms/op 1.01
mainnet_e58758 - phase0 processRegistryUpdates 3.5520 us/op 7.0610 us/op 0.50
mainnet_e58758 - phase0 processSlashings 190.00 ns/op 189.00 ns/op 1.01
mainnet_e58758 - phase0 processEth1DataReset 218.00 ns/op 181.00 ns/op 1.20
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.1456 ms/op 1.0471 ms/op 1.09
mainnet_e58758 - phase0 processSlashingsReset 1.1410 us/op 1.3080 us/op 0.87
mainnet_e58758 - phase0 processRandaoMixesReset 1.3490 us/op 1.4820 us/op 0.91
mainnet_e58758 - phase0 processHistoricalRootsUpdate 191.00 ns/op 198.00 ns/op 0.96
mainnet_e58758 - phase0 processParticipationRecordUpdates 974.00 ns/op 912.00 ns/op 1.07
mainnet_e58758 - phase0 afterProcessEpoch 45.927 ms/op 44.711 ms/op 1.03
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.4305 ms/op 1.4321 ms/op 1.00
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.0229 ms/op 1.9666 ms/op 1.03
altair processInactivityUpdates - 250000 normalcase 19.474 ms/op 19.470 ms/op 1.00
altair processInactivityUpdates - 250000 worstcase 25.917 ms/op 18.148 ms/op 1.43
phase0 processRegistryUpdates - 250000 normalcase 10.515 us/op 6.1400 us/op 1.71
phase0 processRegistryUpdates - 250000 badcase_full_deposits 431.39 us/op 273.12 us/op 1.58
phase0 processRegistryUpdates - 250000 worstcase 0.5 123.62 ms/op 114.26 ms/op 1.08
altair processRewardsAndPenalties - 250000 normalcase 38.124 ms/op 46.340 ms/op 0.82
altair processRewardsAndPenalties - 250000 worstcase 57.432 ms/op 40.243 ms/op 1.43
phase0 getAttestationDeltas - 250000 normalcase 6.4806 ms/op 7.6236 ms/op 0.85
phase0 getAttestationDeltas - 250000 worstcase 7.4376 ms/op 8.0396 ms/op 0.93
phase0 processSlashings - 250000 worstcase 81.249 us/op 114.07 us/op 0.71
altair processSyncCommitteeUpdates - 250000 124.21 ms/op 132.43 ms/op 0.94
BeaconState.hashTreeRoot - No change 242.00 ns/op 218.00 ns/op 1.11
BeaconState.hashTreeRoot - 1 full validator 79.726 us/op 79.794 us/op 1.00
BeaconState.hashTreeRoot - 32 full validator 836.58 us/op 934.47 us/op 0.90
BeaconState.hashTreeRoot - 512 full validator 11.039 ms/op 11.094 ms/op 1.00
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 92.561 us/op 97.368 us/op 0.95
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.6596 ms/op 1.2627 ms/op 1.31
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 19.676 ms/op 21.273 ms/op 0.92
BeaconState.hashTreeRoot - 1 balances 73.510 us/op 75.284 us/op 0.98
BeaconState.hashTreeRoot - 32 balances 707.22 us/op 763.26 us/op 0.93
BeaconState.hashTreeRoot - 512 balances 6.9899 ms/op 9.6454 ms/op 0.72
BeaconState.hashTreeRoot - 250000 balances 172.59 ms/op 169.98 ms/op 1.02
aggregationBits - 2048 els - zipIndexesInBitList 21.594 us/op 23.499 us/op 0.92
byteArrayEquals 32 54.304 ns/op 55.311 ns/op 0.98
Buffer.compare 32 17.308 ns/op 17.212 ns/op 1.01
byteArrayEquals 1024 1.6024 us/op 1.6166 us/op 0.99
Buffer.compare 1024 24.936 ns/op 24.844 ns/op 1.00
byteArrayEquals 16384 25.580 us/op 25.931 us/op 0.99
Buffer.compare 16384 211.45 ns/op 184.22 ns/op 1.15
byteArrayEquals 123687377 196.19 ms/op 206.04 ms/op 0.95
Buffer.compare 123687377 6.2312 ms/op 8.8195 ms/op 0.71
byteArrayEquals 32 - diff last byte 53.162 ns/op 54.051 ns/op 0.98
Buffer.compare 32 - diff last byte 17.260 ns/op 17.627 ns/op 0.98
byteArrayEquals 1024 - diff last byte 1.6029 us/op 1.6121 us/op 0.99
Buffer.compare 1024 - diff last byte 25.373 ns/op 26.148 ns/op 0.97
byteArrayEquals 16384 - diff last byte 25.537 us/op 25.598 us/op 1.00
Buffer.compare 16384 - diff last byte 194.79 ns/op 203.99 ns/op 0.95
byteArrayEquals 123687377 - diff last byte 193.15 ms/op 202.97 ms/op 0.95
Buffer.compare 123687377 - diff last byte 6.2364 ms/op 11.287 ms/op 0.55
byteArrayEquals 32 - random bytes 5.3800 ns/op 6.2250 ns/op 0.86
Buffer.compare 32 - random bytes 17.336 ns/op 20.684 ns/op 0.84
byteArrayEquals 1024 - random bytes 5.1720 ns/op 5.5090 ns/op 0.94
Buffer.compare 1024 - random bytes 17.199 ns/op 19.483 ns/op 0.88
byteArrayEquals 16384 - random bytes 5.1860 ns/op 7.8680 ns/op 0.66
Buffer.compare 16384 - random bytes 17.309 ns/op 18.903 ns/op 0.92
byteArrayEquals 123687377 - random bytes 8.1600 ns/op 6.7700 ns/op 1.21
Buffer.compare 123687377 - random bytes 18.760 ns/op 21.780 ns/op 0.86
regular array get 100000 times 32.716 us/op 42.670 us/op 0.77
wrappedArray get 100000 times 32.737 us/op 36.010 us/op 0.91
arrayWithProxy get 100000 times 13.339 ms/op 14.329 ms/op 0.93
ssz.Root.equals 46.922 ns/op 70.926 ns/op 0.66
byteArrayEquals 46.297 ns/op 50.612 ns/op 0.91
Buffer.compare 10.656 ns/op 11.719 ns/op 0.91
processSlot - 1 slots 10.621 us/op 17.382 us/op 0.61
processSlot - 32 slots 1.9710 ms/op 2.8196 ms/op 0.70
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 61.125 ms/op 58.595 ms/op 1.04
getCommitteeAssignments - req 1 vs - 250000 vc 2.1476 ms/op 2.1315 ms/op 1.01
getCommitteeAssignments - req 100 vs - 250000 vc 4.2090 ms/op 4.2052 ms/op 1.00
getCommitteeAssignments - req 1000 vs - 250000 vc 4.4986 ms/op 4.4952 ms/op 1.00
findModifiedValidators - 10000 modified validators 807.64 ms/op 919.14 ms/op 0.88
findModifiedValidators - 1000 modified validators 786.24 ms/op 796.03 ms/op 0.99
findModifiedValidators - 100 modified validators 254.62 ms/op 280.73 ms/op 0.91
findModifiedValidators - 10 modified validators 154.72 ms/op 172.14 ms/op 0.90
findModifiedValidators - 1 modified validators 166.37 ms/op 251.63 ms/op 0.66
findModifiedValidators - no difference 219.84 ms/op 228.18 ms/op 0.96
compare ViewDUs 6.4080 s/op 6.3326 s/op 1.01
compare each validator Uint8Array 1.4875 s/op 1.5666 s/op 0.95
compare ViewDU to Uint8Array 1.0505 s/op 1.0515 s/op 1.00
migrate state 1000000 validators, 24 modified, 0 new 950.87 ms/op 882.93 ms/op 1.08
migrate state 1000000 validators, 1700 modified, 1000 new 1.3007 s/op 1.2166 s/op 1.07
migrate state 1000000 validators, 3400 modified, 2000 new 1.5808 s/op 1.7488 s/op 0.90
migrate state 1500000 validators, 24 modified, 0 new 1.0626 s/op 1.0521 s/op 1.01
migrate state 1500000 validators, 1700 modified, 1000 new 1.3191 s/op 1.1006 s/op 1.20
migrate state 1500000 validators, 3400 modified, 2000 new 1.5067 s/op 1.2702 s/op 1.19
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.9600 ns/op 4.2800 ns/op 1.16
state getBlockRootAtSlot - 250000 vs - 7PWei 571.52 ns/op 557.41 ns/op 1.03
naive computeProposerIndex 100000 validators 58.553 ms/op
computeProposerIndex 100000 validators 10.086 ms/op
naiveGetNextSyncCommitteeIndices 1000 validators 8.7471 s/op
getNextSyncCommitteeIndices 1000 validators 326.05 ms/op
naiveGetNextSyncCommitteeIndices 10000 validators 9.1964 s/op
getNextSyncCommitteeIndices 10000 validators 285.58 ms/op
naiveGetNextSyncCommitteeIndices 100000 validators 8.2897 s/op
getNextSyncCommitteeIndices 100000 validators 271.88 ms/op
naive computeShuffledIndex 100000 validators 26.569 s/op
cached computeShuffledIndex 100000 validators 548.03 ms/op
naive computeShuffledIndex 2000000 validators 566.07 s/op
cached computeShuffledIndex 2000000 validators 77.671 s/op
computeProposers - vc 250000 25.174 ms/op 8.5687 ms/op 2.94
computeEpochShuffling - vc 250000 63.779 ms/op 42.165 ms/op 1.51
getNextSyncCommittee - vc 250000 340.09 ms/op 149.30 ms/op 2.28
computeSigningRoot for AttestationData 59.942 us/op 21.638 us/op 2.77
hash AttestationData serialized data then Buffer.toString(base64) 2.1485 us/op 1.5852 us/op 1.36
toHexString serialized data 2.2902 us/op 1.1423 us/op 2.00
Buffer.toString(base64) 216.77 ns/op 167.98 ns/op 1.29
nodejs block root to RootHex using toHex 183.56 ns/op 145.00 ns/op 1.27
nodejs block root to RootHex using toRootHex 103.11 ns/op 87.170 ns/op 1.18
browser block root to RootHex using the deprecated toHexString 304.92 ns/op 233.88 ns/op 1.30
browser block root to RootHex using toHex 236.51 ns/op 179.95 ns/op 1.31
browser block root to RootHex using toRootHex 224.92 ns/op 159.96 ns/op 1.41

by benchmarkbot/action

@twoeths
Copy link
Contributor Author

twoeths commented Feb 12, 2025

posting benchmark from my personal ubuntu node which is the same to lg/mainnet nodes:

computeProposerIndex
    ✔ naive computeProposerIndex 100000 validators                        15.96156 ops/s    62.65051 ms/op        -          9 runs   62.6 s
    ✔ computeProposerIndex 100000 validators                              93.41964 ops/s    10.70439 ms/op        -         10 runs   11.8 s

  getNextSyncCommitteeIndices electra
    ✔ naiveGetNextSyncCommitteeIndices 1000 validators                   0.1083408 ops/s    9.230135  s/op        -          6 runs   64.5 s
    ✔ getNextSyncCommitteeIndices 1000 validators                         141.7483 ops/s    7.054757 ms/op        -         31 runs  0.726 s
    ✔ naiveGetNextSyncCommitteeIndices 10000 validators                  0.1075031 ops/s    9.302056  s/op        -          6 runs   65.1 s
    ✔ getNextSyncCommitteeIndices 10000 validators                        141.9954 ops/s    7.042481 ms/op        -         31 runs  0.723 s
    ✔ naiveGetNextSyncCommitteeIndices 100000 validators                 0.1068357 ops/s    9.360169  s/op        -          6 runs   65.5 s
    ✔ getNextSyncCommitteeIndices 100000 validators                       142.0709 ops/s    7.038738 ms/op        -         31 runs  0.722 s

  computeShuffledIndex
    ✔ naive computeShuffledIndex 100000 validators                      0.03294994 ops/s    30.34907  s/op        -          2 runs   90.1 s
    ✔ cached computeShuffledIndex 100000 validators                       1.535617 ops/s    651.2041 ms/op        -         10 runs   7.17 s
    ✔ naive computeShuffledIndex 2000000 validators                    0.001597753 ops/s    625.8789  s/op        -          1 runs 1.25e+3 s
    ✔ cached computeShuffledIndex 2000000 validators                    0.02168103 ops/s    46.12328  s/op        -          1 runs   92.5 s

it also shows >1000x improvement for some tests, similar to my local environment

wemeetagain
wemeetagain previously approved these changes Feb 12, 2025
Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, appreciate the thorough analysis

wemeetagain
wemeetagain previously approved these changes Feb 12, 2025
Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few small things, lgtm


let i = 0;
let cachedHash: Uint8Array | null = null;
const cachedHashInput = Buffer.allocUnsafe(32 + 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added: reuse this buffer as the input for digest below

@@ -120,6 +120,8 @@ export function computeProposerIndex(
const shuffledResult = new Map<number, number>();

let i = 0;
const cachedHashInput = Buffer.allocUnsafe(32 + 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added: reuse this buffer as the input for digest below

@@ -402,7 +408,7 @@ export function getComputeShuffledIndexFn(indexCount: number, seed: Bytes32): Co
// bytesToBigInt(digest(Buffer.concat([_seed, intToBytes(i, 1)])).slice(0, 8)) % BigInt(indexCount)
// );
pivotBuffer[32] = i % 256;
pivot = Number(bytesToBigInt(digest(pivotBuffer).slice(0, 8)) % BigInt(indexCount));
pivot = Number(bytesToBigInt(digest(pivotBuffer).subarray(0, 8)) % BigInt(indexCount));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added: use subarray instead of slice

@wemeetagain wemeetagain marked this pull request as ready for review February 12, 2025 20:45
@wemeetagain wemeetagain requested a review from a team as a code owner February 12, 2025 20:46
@wemeetagain wemeetagain merged commit 85b13c1 into unstable Feb 12, 2025
20 checks passed
@wemeetagain wemeetagain deleted the te/improve_sync_committee_updates branch February 12, 2025 20:46
@wemeetagain
Copy link
Member

🎉 This PR is included in v1.27.0 🎉

@dapplion
Copy link
Contributor

GG 👏

@shanghan345

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Electra: performance issue computing sync committee indices
4 participants