fix: improve sync committee updates #7456

twoeths · 2025-02-12T08:57:56Z

Motivation

from electra, processSyncCommitteeUpdates() could be >15s according to the devnet

Description

the main fix is in computeShuffledIndex where we can cache pivot and source computation there
some other optimization other than that:
- only compute hash once every 16 iterations
- compute int manually instead of using bytesToInt in order not to use BigInt
- cache shuffled index

I guess if we use hashtree we can improve more but the diff is a lot already and the main optimization is in computeShuffledIndex(), not the hash function. We can consider that in the future.

We can also improve pre-electra but I think it's been not that bad for a long time, so only focus on electra in this PR

Closes #7366

Tests

added unit tests to compare naive version vs the optimized version
benchmarks on local show >1000x difference for the main concerned function naiveGetNextSyncCommitteeIndices() while CI only show >20x difference. This is my local

computeProposerIndex
    ✔ naive computeProposerIndex 100000 validators                        31.86491 ops/s    31.38248 ms/op        -         10 runs   34.5 s
    ✔ computeProposerIndex 100000 validators                              106.2267 ops/s    9.413833 ms/op        -         10 runs   10.4 s

  getNextSyncCommitteeIndices electra
    ✔ naiveGetNextSyncCommitteeIndices 1000 validators                   0.2121840 ops/s    4.712890  s/op        -         10 runs   51.7 s
    ✔ getNextSyncCommitteeIndices 1000 validators                         214.9251 ops/s    4.652783 ms/op        -         45 runs  0.714 s
    ✔ naiveGetNextSyncCommitteeIndices 10000 validators                  0.2122278 ops/s    4.711918  s/op        -         10 runs   51.8 s
    ✔ getNextSyncCommitteeIndices 10000 validators                        220.2337 ops/s    4.540632 ms/op        -         46 runs  0.710 s
    ✔ naiveGetNextSyncCommitteeIndices 100000 validators                 0.2117828 ops/s    4.721820  s/op        -         10 runs   52.2 s
    ✔ getNextSyncCommitteeIndices 100000 validators                       204.7383 ops/s    4.884283 ms/op        -         43 runs  0.714 s

  computeShuffledIndex
    ✔ naive computeShuffledIndex 100000 validators                      0.06638498 ops/s    15.06365  s/op        -          3 runs   60.3 s
    ✔ cached computeShuffledIndex 100000 validators                       1.932706 ops/s    517.4092 ms/op        -         10 runs   5.72 s

codecov · 2025-02-12T09:14:32Z

Codecov Report

Attention: Patch coverage is 94.67456% with 9 lines in your changes missing coverage. Please review.

Project coverage is 50.44%. Comparing base (2247c16) to head (8713832).
Report is 3 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #7456      +/-   ##
============================================
+ Coverage     50.25%   50.44%   +0.19%     
============================================
  Files           602      602              
  Lines         40401    40583     +182     
  Branches       2204     2229      +25     
============================================
+ Hits          20305    20474     +169     
- Misses        20056    20069      +13     
  Partials         40       40

github-actions · 2025-02-12T09:36:42Z

Performance Report

🚀🚀 Significant benchmark improvement detected

Benchmark suite	Current: `5ce0389`	Previous: `e45e0eb`	Ratio
forkChoice updateHead vc 600000 bc 64 eq 300000	17.732 ms/op	56.830 ms/op	0.31

Full benchmark results

Benchmark suite	Current: `5ce0389`	Previous: `e45e0eb`	Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc	1.2612 ms/op	967.80 us/op	1.30
getPubkeys - validatorsArr - req 1000 vs - 250000 vc	41.295 us/op	35.981 us/op	1.15
BLS verify - blst	985.25 us/op	817.77 us/op	1.20
BLS verifyMultipleSignatures 3 - blst	1.3825 ms/op	1.2220 ms/op	1.13
BLS verifyMultipleSignatures 8 - blst	2.0204 ms/op	1.7056 ms/op	1.18
BLS verifyMultipleSignatures 32 - blst	6.0422 ms/op	4.9317 ms/op	1.23
BLS verifyMultipleSignatures 64 - blst	11.843 ms/op	9.1516 ms/op	1.29
BLS verifyMultipleSignatures 128 - blst	18.820 ms/op	18.359 ms/op	1.03
BLS deserializing 10000 signatures	788.90 ms/op	706.46 ms/op	1.12
BLS deserializing 100000 signatures	7.5122 s/op	7.0843 s/op	1.06
BLS verifyMultipleSignatures - same message - 3 - blst	1.1192 ms/op	1.1020 ms/op	1.02
BLS verifyMultipleSignatures - same message - 8 - blst	1.1281 ms/op	1.2250 ms/op	0.92
BLS verifyMultipleSignatures - same message - 32 - blst	1.8808 ms/op	2.0611 ms/op	0.91
BLS verifyMultipleSignatures - same message - 64 - blst	2.8436 ms/op	2.9255 ms/op	0.97
BLS verifyMultipleSignatures - same message - 128 - blst	5.4133 ms/op	4.8755 ms/op	1.11
BLS aggregatePubkeys 32 - blst	21.486 us/op	20.260 us/op	1.06
BLS aggregatePubkeys 128 - blst	88.746 us/op	73.051 us/op	1.21
notSeenSlots=1 numMissedVotes=1 numBadVotes=10	55.174 ms/op	69.307 ms/op	0.80
notSeenSlots=1 numMissedVotes=0 numBadVotes=4	54.259 ms/op	58.467 ms/op	0.93
notSeenSlots=2 numMissedVotes=1 numBadVotes=10	43.652 ms/op	43.737 ms/op	1.00
getSlashingsAndExits - default max	78.136 us/op	79.359 us/op	0.98
getSlashingsAndExits - 2k	349.31 us/op	436.31 us/op	0.80
proposeBlockBody type=full, size=empty	6.1992 ms/op	7.5658 ms/op	0.82
isKnown best case - 1 super set check	223.00 ns/op	205.00 ns/op	1.09
isKnown normal case - 2 super set checks	212.00 ns/op	202.00 ns/op	1.05
isKnown worse case - 16 super set checks	208.00 ns/op	195.00 ns/op	1.07
InMemoryCheckpointStateCache - add get delete	2.5580 us/op	2.5090 us/op	1.02
validate api signedAggregateAndProof - struct	1.4940 ms/op	1.5124 ms/op	0.99
validate gossip signedAggregateAndProof - struct	1.5562 ms/op	1.9862 ms/op	0.78
batch validate gossip attestation - vc 640000 - chunk 32	141.13 us/op	161.40 us/op	0.87
batch validate gossip attestation - vc 640000 - chunk 64	128.25 us/op	130.27 us/op	0.98
batch validate gossip attestation - vc 640000 - chunk 128	124.12 us/op	129.12 us/op	0.96
batch validate gossip attestation - vc 640000 - chunk 256	122.51 us/op	137.43 us/op	0.89
pickEth1Vote - no votes	1.1353 ms/op	2.0641 ms/op	0.55
pickEth1Vote - max votes	8.5509 ms/op	11.642 ms/op	0.73
pickEth1Vote - Eth1Data hashTreeRoot value x2048	17.268 ms/op	20.411 ms/op	0.85
pickEth1Vote - Eth1Data hashTreeRoot tree x2048	24.436 ms/op	31.101 ms/op	0.79
pickEth1Vote - Eth1Data fastSerialize value x2048	484.96 us/op	548.83 us/op	0.88
pickEth1Vote - Eth1Data fastSerialize tree x2048	2.4735 ms/op	5.0010 ms/op	0.49
bytes32 toHexString	384.00 ns/op	406.00 ns/op	0.95
bytes32 Buffer.toString(hex)	283.00 ns/op	243.00 ns/op	1.16
bytes32 Buffer.toString(hex) from Uint8Array	409.00 ns/op	447.00 ns/op	0.91
bytes32 Buffer.toString(hex) + 0x	252.00 ns/op	425.00 ns/op	0.59
Object access 1 prop	0.12200 ns/op	0.16200 ns/op	0.75
Map access 1 prop	0.12400 ns/op	0.19200 ns/op	0.65
Object get x1000	6.2420 ns/op	8.2330 ns/op	0.76
Map get x1000	6.7810 ns/op	6.7410 ns/op	1.01
Object set x1000	29.527 ns/op	54.004 ns/op	0.55
Map set x1000	20.647 ns/op	24.415 ns/op	0.85
Return object 10000 times	0.30210 ns/op	0.32000 ns/op	0.94
Throw Error 10000 times	4.5244 us/op	6.3540 us/op	0.71
toHex	150.45 ns/op	148.09 ns/op	1.02
Buffer.from	130.30 ns/op	179.05 ns/op	0.73
shared Buffer	87.409 ns/op	81.836 ns/op	1.07
fastMsgIdFn sha256 / 200 bytes	2.3540 us/op	2.2940 us/op	1.03
fastMsgIdFn h32 xxhash / 200 bytes	242.00 ns/op	210.00 ns/op	1.15
fastMsgIdFn h64 xxhash / 200 bytes	308.00 ns/op	282.00 ns/op	1.09
fastMsgIdFn sha256 / 1000 bytes	7.7080 us/op	7.6310 us/op	1.01
fastMsgIdFn h32 xxhash / 1000 bytes	345.00 ns/op	346.00 ns/op	1.00
fastMsgIdFn h64 xxhash / 1000 bytes	354.00 ns/op	363.00 ns/op	0.98
fastMsgIdFn sha256 / 10000 bytes	68.202 us/op	68.758 us/op	0.99
fastMsgIdFn h32 xxhash / 10000 bytes	1.9280 us/op	1.9000 us/op	1.01
fastMsgIdFn h64 xxhash / 10000 bytes	1.3060 us/op	1.2580 us/op	1.04
send data - 1000 256B messages	13.670 ms/op	18.147 ms/op	0.75
send data - 1000 512B messages	18.776 ms/op	23.192 ms/op	0.81
send data - 1000 1024B messages	25.909 ms/op	30.094 ms/op	0.86
send data - 1000 1200B messages	26.848 ms/op	26.176 ms/op	1.03
send data - 1000 2048B messages	29.220 ms/op	25.956 ms/op	1.13
send data - 1000 4096B messages	30.785 ms/op	29.951 ms/op	1.03
send data - 1000 16384B messages	64.273 ms/op	78.352 ms/op	0.82
send data - 1000 65536B messages	227.77 ms/op	231.64 ms/op	0.98
enrSubnets - fastDeserialize 64 bits	1.0520 us/op	948.00 ns/op	1.11
enrSubnets - ssz BitVector 64 bits	366.00 ns/op	327.00 ns/op	1.12
enrSubnets - fastDeserialize 4 bits	164.00 ns/op	135.00 ns/op	1.21
enrSubnets - ssz BitVector 4 bits	451.00 ns/op	329.00 ns/op	1.37
prioritizePeers score -10:0 att 32-0.1 sync 2-0	163.17 us/op	122.08 us/op	1.34
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25	190.48 us/op	149.63 us/op	1.27
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5	272.24 us/op	212.68 us/op	1.28
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75	521.79 us/op	395.31 us/op	1.32
prioritizePeers score 0:0 att 64-1 sync 4-1	640.64 us/op	478.74 us/op	1.34
array of 16000 items push then shift	1.8231 us/op	1.6753 us/op	1.09
LinkedList of 16000 items push then shift	8.7280 ns/op	8.3880 ns/op	1.04
array of 16000 items push then pop	99.394 ns/op	83.222 ns/op	1.19
LinkedList of 16000 items push then pop	8.1980 ns/op	7.7230 ns/op	1.06
array of 24000 items push then shift	2.7253 us/op	2.4817 us/op	1.10
LinkedList of 24000 items push then shift	8.1550 ns/op	9.0650 ns/op	0.90
array of 24000 items push then pop	111.99 ns/op	129.97 ns/op	0.86
LinkedList of 24000 items push then pop	7.3920 ns/op	8.3300 ns/op	0.89
intersect bitArray bitLen 8	6.9620 ns/op	6.4060 ns/op	1.09
intersect array and set length 8	41.382 ns/op	48.086 ns/op	0.86
intersect bitArray bitLen 128	31.810 ns/op	30.320 ns/op	1.05
intersect array and set length 128	681.48 ns/op	635.75 ns/op	1.07
bitArray.getTrueBitIndexes() bitLen 128	1.2960 us/op	1.0900 us/op	1.19
bitArray.getTrueBitIndexes() bitLen 248	2.5890 us/op	2.2140 us/op	1.17
bitArray.getTrueBitIndexes() bitLen 512	4.9720 us/op	3.9620 us/op	1.25
Buffer.concat 32 items	748.00 ns/op	780.00 ns/op	0.96
Uint8Array.set 32 items	1.7720 us/op	1.9450 us/op	0.91
Buffer.copy	3.3500 us/op	2.5020 us/op	1.34
Uint8Array.set - with subarray	3.1830 us/op	1.6710 us/op	1.90
Uint8Array.set - without subarray	1.9640 us/op	1.2530 us/op	1.57
getUint32 - dataview	207.00 ns/op	194.00 ns/op	1.07
getUint32 - manual	130.00 ns/op	155.00 ns/op	0.84
Set add up to 64 items then delete first	2.8547 us/op	3.1324 us/op	0.91
OrderedSet add up to 64 items then delete first	3.9448 us/op	4.2082 us/op	0.94
Set add up to 64 items then delete last	3.2016 us/op	3.7277 us/op	0.86
OrderedSet add up to 64 items then delete last	5.3586 us/op	4.8845 us/op	1.10
Set add up to 64 items then delete middle	3.2149 us/op	3.2099 us/op	1.00
OrderedSet add up to 64 items then delete middle	7.6479 us/op	6.0673 us/op	1.26
Set add up to 128 items then delete first	6.4990 us/op	6.1757 us/op	1.05
OrderedSet add up to 128 items then delete first	9.7111 us/op	12.986 us/op	0.75
Set add up to 128 items then delete last	6.8958 us/op	6.8848 us/op	1.00
OrderedSet add up to 128 items then delete last	9.1439 us/op	8.2299 us/op	1.11
Set add up to 128 items then delete middle	6.4963 us/op	6.5891 us/op	0.99
OrderedSet add up to 128 items then delete middle	18.109 us/op	18.690 us/op	0.97
Set add up to 256 items then delete first	13.934 us/op	13.208 us/op	1.05
OrderedSet add up to 256 items then delete first	19.983 us/op	23.646 us/op	0.85
Set add up to 256 items then delete last	12.754 us/op	13.629 us/op	0.94
OrderedSet add up to 256 items then delete last	18.459 us/op	28.226 us/op	0.65
Set add up to 256 items then delete middle	13.751 us/op	15.621 us/op	0.88
OrderedSet add up to 256 items then delete middle	52.515 us/op	48.645 us/op	1.08
transfer serialized Status (84 B)	2.6000 us/op	2.3130 us/op	1.12
copy serialized Status (84 B)	1.7290 us/op	1.1980 us/op	1.44
transfer serialized SignedVoluntaryExit (112 B)	2.6180 us/op	3.0560 us/op	0.86
copy serialized SignedVoluntaryExit (112 B)	1.5060 us/op	2.2670 us/op	0.66
transfer serialized ProposerSlashing (416 B)	3.9270 us/op	4.0420 us/op	0.97
copy serialized ProposerSlashing (416 B)	2.8910 us/op	2.0060 us/op	1.44
transfer serialized Attestation (485 B)	3.2320 us/op	3.1420 us/op	1.03
copy serialized Attestation (485 B)	1.4470 us/op	1.3210 us/op	1.10
transfer serialized AttesterSlashing (33232 B)	2.6930 us/op	3.2760 us/op	0.82
copy serialized AttesterSlashing (33232 B)	3.7250 us/op	5.3970 us/op	0.69
transfer serialized Small SignedBeaconBlock (128000 B)	3.4030 us/op	5.0780 us/op	0.67
copy serialized Small SignedBeaconBlock (128000 B)	12.715 us/op	17.829 us/op	0.71
transfer serialized Avg SignedBeaconBlock (200000 B)	3.7040 us/op	6.0880 us/op	0.61
copy serialized Avg SignedBeaconBlock (200000 B)	14.821 us/op	24.069 us/op	0.62
transfer serialized BlobsSidecar (524380 B)	5.1720 us/op	8.2070 us/op	0.63
copy serialized BlobsSidecar (524380 B)	65.804 us/op	96.949 us/op	0.68
transfer serialized Big SignedBeaconBlock (1000000 B)	4.1550 us/op	7.7920 us/op	0.53
copy serialized Big SignedBeaconBlock (1000000 B)	175.63 us/op	362.84 us/op	0.48
pass gossip attestations to forkchoice per slot	3.3543 ms/op	5.0025 ms/op	0.67
forkChoice updateHead vc 100000 bc 64 eq 0	493.21 us/op	713.50 us/op	0.69
forkChoice updateHead vc 600000 bc 64 eq 0	2.9853 ms/op	6.3461 ms/op	0.47
forkChoice updateHead vc 1000000 bc 64 eq 0	5.2416 ms/op	10.787 ms/op	0.49
forkChoice updateHead vc 600000 bc 320 eq 0	3.2569 ms/op	6.0719 ms/op	0.54
forkChoice updateHead vc 600000 bc 1200 eq 0	3.4893 ms/op	5.6614 ms/op	0.62
forkChoice updateHead vc 600000 bc 7200 eq 0	3.9425 ms/op	6.0747 ms/op	0.65
forkChoice updateHead vc 600000 bc 64 eq 1000	12.113 ms/op	11.817 ms/op	1.02
forkChoice updateHead vc 600000 bc 64 eq 10000	12.375 ms/op	12.531 ms/op	0.99
forkChoice updateHead vc 600000 bc 64 eq 300000	17.732 ms/op	56.830 ms/op	0.31
computeDeltas 500000 validators 300 proto nodes	4.3263 ms/op	6.2421 ms/op	0.69
computeDeltas 500000 validators 1200 proto nodes	4.3824 ms/op	6.1386 ms/op	0.71
computeDeltas 500000 validators 7200 proto nodes	4.2442 ms/op	6.3498 ms/op	0.67
computeDeltas 750000 validators 300 proto nodes	6.4660 ms/op	8.4366 ms/op	0.77
computeDeltas 750000 validators 1200 proto nodes	6.6405 ms/op	7.8741 ms/op	0.84
computeDeltas 750000 validators 7200 proto nodes	6.8088 ms/op	6.7574 ms/op	1.01
computeDeltas 1400000 validators 300 proto nodes	12.611 ms/op	12.393 ms/op	1.02
computeDeltas 1400000 validators 1200 proto nodes	12.666 ms/op	17.566 ms/op	0.72
computeDeltas 1400000 validators 7200 proto nodes	12.680 ms/op	12.606 ms/op	1.01
computeDeltas 2100000 validators 300 proto nodes	19.141 ms/op	21.572 ms/op	0.89
computeDeltas 2100000 validators 1200 proto nodes	20.412 ms/op	19.246 ms/op	1.06
computeDeltas 2100000 validators 7200 proto nodes	17.953 ms/op	21.455 ms/op	0.84
altair processAttestation - 250000 vs - 7PWei normalcase	2.2323 ms/op	4.2598 ms/op	0.52
altair processAttestation - 250000 vs - 7PWei worstcase	3.1412 ms/op	4.8564 ms/op	0.65
altair processAttestation - setStatus - 1/6 committees join	130.19 us/op	141.35 us/op	0.92
altair processAttestation - setStatus - 1/3 committees join	253.57 us/op	296.83 us/op	0.85
altair processAttestation - setStatus - 1/2 committees join	358.53 us/op	398.72 us/op	0.90
altair processAttestation - setStatus - 2/3 committees join	467.47 us/op	807.52 us/op	0.58
altair processAttestation - setStatus - 4/5 committees join	632.01 us/op	644.38 us/op	0.98
altair processAttestation - setStatus - 100% committees join	792.66 us/op	1.2432 ms/op	0.64
altair processBlock - 250000 vs - 7PWei normalcase	5.2667 ms/op	8.5889 ms/op	0.61
altair processBlock - 250000 vs - 7PWei normalcase hashState	37.039 ms/op	48.711 ms/op	0.76
altair processBlock - 250000 vs - 7PWei worstcase	44.423 ms/op	58.829 ms/op	0.76
altair processBlock - 250000 vs - 7PWei worstcase hashState	89.856 ms/op	120.64 ms/op	0.74
phase0 processBlock - 250000 vs - 7PWei normalcase	2.1187 ms/op	3.8707 ms/op	0.55
phase0 processBlock - 250000 vs - 7PWei worstcase	25.910 ms/op	31.420 ms/op	0.82
altair processEth1Data - 250000 vs - 7PWei normalcase	378.09 us/op	428.43 us/op	0.88
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15	5.1140 us/op	9.3930 us/op	0.54
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219	31.246 us/op	47.874 us/op	0.65
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42	8.4400 us/op	16.374 us/op	0.52
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18	5.4240 us/op	8.7880 us/op	0.62
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020	120.25 us/op	178.53 us/op	0.67
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777	1.2658 ms/op	1.0913 ms/op	1.16
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384	1.7424 ms/op	1.4398 ms/op	1.21
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384	1.7457 ms/op	1.4339 ms/op	1.22
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384	3.7548 ms/op	3.9726 ms/op	0.95
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384	1.7566 ms/op	1.7316 ms/op	1.01
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384	3.9238 ms/op	4.5066 ms/op	0.87
Tree 40 250000 create	452.92 ms/op	969.40 ms/op	0.47
Tree 40 250000 get(125000)	160.37 ns/op	149.24 ns/op	1.07
Tree 40 250000 set(125000)	1.5744 us/op	2.8848 us/op	0.55
Tree 40 250000 toArray()	20.236 ms/op	28.480 ms/op	0.71
Tree 40 250000 iterate all - toArray() + loop	23.147 ms/op	26.099 ms/op	0.89
Tree 40 250000 iterate all - get(i)	58.598 ms/op	66.952 ms/op	0.88
Array 250000 create	3.4392 ms/op	4.7325 ms/op	0.73
Array 250000 clone - spread	1.5172 ms/op	4.4526 ms/op	0.34
Array 250000 get(125000)	0.43100 ns/op	0.41100 ns/op	1.05
Array 250000 set(125000)	0.51000 ns/op	0.43500 ns/op	1.17
Array 250000 iterate all - loop	86.622 us/op	106.08 us/op	0.82
phase0 afterProcessEpoch - 250000 vs - 7PWei	54.222 ms/op	53.980 ms/op	1.00
Array.fill - length 1000000	4.0021 ms/op	9.4050 ms/op	0.43
Array push - length 1000000	15.992 ms/op	33.422 ms/op	0.48
Array.get	0.28103 ns/op	0.48273 ns/op	0.58
Uint8Array.get	0.46926 ns/op	0.47553 ns/op	0.99
phase0 beforeProcessEpoch - 250000 vs - 7PWei	16.911 ms/op	33.637 ms/op	0.50
altair processEpoch - mainnet_e81889	315.91 ms/op	363.36 ms/op	0.87
mainnet_e81889 - altair beforeProcessEpoch	20.537 ms/op	25.462 ms/op	0.81
mainnet_e81889 - altair processJustificationAndFinalization	6.6590 us/op	6.2650 us/op	1.06
mainnet_e81889 - altair processInactivityUpdates	4.8435 ms/op	6.1582 ms/op	0.79
mainnet_e81889 - altair processRewardsAndPenalties	47.919 ms/op	50.373 ms/op	0.95
mainnet_e81889 - altair processRegistryUpdates	827.00 ns/op	826.00 ns/op	1.00
mainnet_e81889 - altair processSlashings	215.00 ns/op	189.00 ns/op	1.14
mainnet_e81889 - altair processEth1DataReset	205.00 ns/op	188.00 ns/op	1.09
mainnet_e81889 - altair processEffectiveBalanceUpdates	1.3525 ms/op	1.3876 ms/op	0.97
mainnet_e81889 - altair processSlashingsReset	1.1460 us/op	1.1020 us/op	1.04
mainnet_e81889 - altair processRandaoMixesReset	1.5340 us/op	1.1520 us/op	1.33
mainnet_e81889 - altair processHistoricalRootsUpdate	205.00 ns/op	225.00 ns/op	0.91
mainnet_e81889 - altair processParticipationFlagUpdates	623.00 ns/op	607.00 ns/op	1.03
mainnet_e81889 - altair processSyncCommitteeUpdates	160.00 ns/op	149.00 ns/op	1.07
mainnet_e81889 - altair afterProcessEpoch	59.019 ms/op	56.001 ms/op	1.05
capella processEpoch - mainnet_e217614	1.0255 s/op	1.0455 s/op	0.98
mainnet_e217614 - capella beforeProcessEpoch	70.024 ms/op	108.01 ms/op	0.65
mainnet_e217614 - capella processJustificationAndFinalization	8.3370 us/op	7.0750 us/op	1.18
mainnet_e217614 - capella processInactivityUpdates	16.286 ms/op	21.268 ms/op	0.77
mainnet_e217614 - capella processRewardsAndPenalties	207.05 ms/op	202.11 ms/op	1.02
mainnet_e217614 - capella processRegistryUpdates	7.0150 us/op	7.0870 us/op	0.99
mainnet_e217614 - capella processSlashings	186.00 ns/op	188.00 ns/op	0.99
mainnet_e217614 - capella processEth1DataReset	181.00 ns/op	186.00 ns/op	0.97
mainnet_e217614 - capella processEffectiveBalanceUpdates	10.555 ms/op	15.901 ms/op	0.66
mainnet_e217614 - capella processSlashingsReset	1.1680 us/op	938.00 ns/op	1.25
mainnet_e217614 - capella processRandaoMixesReset	1.2170 us/op	1.2030 us/op	1.01
mainnet_e217614 - capella processHistoricalRootsUpdate	182.00 ns/op	190.00 ns/op	0.96
mainnet_e217614 - capella processParticipationFlagUpdates	534.00 ns/op	566.00 ns/op	0.94
mainnet_e217614 - capella afterProcessEpoch	128.11 ms/op	129.71 ms/op	0.99
phase0 processEpoch - mainnet_e58758	300.76 ms/op	306.69 ms/op	0.98
mainnet_e58758 - phase0 beforeProcessEpoch	87.645 ms/op	93.576 ms/op	0.94
mainnet_e58758 - phase0 processJustificationAndFinalization	8.6260 us/op	5.7170 us/op	1.51
mainnet_e58758 - phase0 processRewardsAndPenalties	39.237 ms/op	38.769 ms/op	1.01
mainnet_e58758 - phase0 processRegistryUpdates	3.5520 us/op	7.0610 us/op	0.50
mainnet_e58758 - phase0 processSlashings	190.00 ns/op	189.00 ns/op	1.01
mainnet_e58758 - phase0 processEth1DataReset	218.00 ns/op	181.00 ns/op	1.20
mainnet_e58758 - phase0 processEffectiveBalanceUpdates	1.1456 ms/op	1.0471 ms/op	1.09
mainnet_e58758 - phase0 processSlashingsReset	1.1410 us/op	1.3080 us/op	0.87
mainnet_e58758 - phase0 processRandaoMixesReset	1.3490 us/op	1.4820 us/op	0.91
mainnet_e58758 - phase0 processHistoricalRootsUpdate	191.00 ns/op	198.00 ns/op	0.96
mainnet_e58758 - phase0 processParticipationRecordUpdates	974.00 ns/op	912.00 ns/op	1.07
mainnet_e58758 - phase0 afterProcessEpoch	45.927 ms/op	44.711 ms/op	1.03
phase0 processEffectiveBalanceUpdates - 250000 normalcase	1.4305 ms/op	1.4321 ms/op	1.00
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5	2.0229 ms/op	1.9666 ms/op	1.03
altair processInactivityUpdates - 250000 normalcase	19.474 ms/op	19.470 ms/op	1.00
altair processInactivityUpdates - 250000 worstcase	25.917 ms/op	18.148 ms/op	1.43
phase0 processRegistryUpdates - 250000 normalcase	10.515 us/op	6.1400 us/op	1.71
phase0 processRegistryUpdates - 250000 badcase_full_deposits	431.39 us/op	273.12 us/op	1.58
phase0 processRegistryUpdates - 250000 worstcase 0.5	123.62 ms/op	114.26 ms/op	1.08
altair processRewardsAndPenalties - 250000 normalcase	38.124 ms/op	46.340 ms/op	0.82
altair processRewardsAndPenalties - 250000 worstcase	57.432 ms/op	40.243 ms/op	1.43
phase0 getAttestationDeltas - 250000 normalcase	6.4806 ms/op	7.6236 ms/op	0.85
phase0 getAttestationDeltas - 250000 worstcase	7.4376 ms/op	8.0396 ms/op	0.93
phase0 processSlashings - 250000 worstcase	81.249 us/op	114.07 us/op	0.71
altair processSyncCommitteeUpdates - 250000	124.21 ms/op	132.43 ms/op	0.94
BeaconState.hashTreeRoot - No change	242.00 ns/op	218.00 ns/op	1.11
BeaconState.hashTreeRoot - 1 full validator	79.726 us/op	79.794 us/op	1.00
BeaconState.hashTreeRoot - 32 full validator	836.58 us/op	934.47 us/op	0.90
BeaconState.hashTreeRoot - 512 full validator	11.039 ms/op	11.094 ms/op	1.00
BeaconState.hashTreeRoot - 1 validator.effectiveBalance	92.561 us/op	97.368 us/op	0.95
BeaconState.hashTreeRoot - 32 validator.effectiveBalance	1.6596 ms/op	1.2627 ms/op	1.31
BeaconState.hashTreeRoot - 512 validator.effectiveBalance	19.676 ms/op	21.273 ms/op	0.92
BeaconState.hashTreeRoot - 1 balances	73.510 us/op	75.284 us/op	0.98
BeaconState.hashTreeRoot - 32 balances	707.22 us/op	763.26 us/op	0.93
BeaconState.hashTreeRoot - 512 balances	6.9899 ms/op	9.6454 ms/op	0.72
BeaconState.hashTreeRoot - 250000 balances	172.59 ms/op	169.98 ms/op	1.02
aggregationBits - 2048 els - zipIndexesInBitList	21.594 us/op	23.499 us/op	0.92
byteArrayEquals 32	54.304 ns/op	55.311 ns/op	0.98
Buffer.compare 32	17.308 ns/op	17.212 ns/op	1.01
byteArrayEquals 1024	1.6024 us/op	1.6166 us/op	0.99
Buffer.compare 1024	24.936 ns/op	24.844 ns/op	1.00
byteArrayEquals 16384	25.580 us/op	25.931 us/op	0.99
Buffer.compare 16384	211.45 ns/op	184.22 ns/op	1.15
byteArrayEquals 123687377	196.19 ms/op	206.04 ms/op	0.95
Buffer.compare 123687377	6.2312 ms/op	8.8195 ms/op	0.71
byteArrayEquals 32 - diff last byte	53.162 ns/op	54.051 ns/op	0.98
Buffer.compare 32 - diff last byte	17.260 ns/op	17.627 ns/op	0.98
byteArrayEquals 1024 - diff last byte	1.6029 us/op	1.6121 us/op	0.99
Buffer.compare 1024 - diff last byte	25.373 ns/op	26.148 ns/op	0.97
byteArrayEquals 16384 - diff last byte	25.537 us/op	25.598 us/op	1.00
Buffer.compare 16384 - diff last byte	194.79 ns/op	203.99 ns/op	0.95
byteArrayEquals 123687377 - diff last byte	193.15 ms/op	202.97 ms/op	0.95
Buffer.compare 123687377 - diff last byte	6.2364 ms/op	11.287 ms/op	0.55
byteArrayEquals 32 - random bytes	5.3800 ns/op	6.2250 ns/op	0.86
Buffer.compare 32 - random bytes	17.336 ns/op	20.684 ns/op	0.84
byteArrayEquals 1024 - random bytes	5.1720 ns/op	5.5090 ns/op	0.94
Buffer.compare 1024 - random bytes	17.199 ns/op	19.483 ns/op	0.88
byteArrayEquals 16384 - random bytes	5.1860 ns/op	7.8680 ns/op	0.66
Buffer.compare 16384 - random bytes	17.309 ns/op	18.903 ns/op	0.92
byteArrayEquals 123687377 - random bytes	8.1600 ns/op	6.7700 ns/op	1.21
Buffer.compare 123687377 - random bytes	18.760 ns/op	21.780 ns/op	0.86
regular array get 100000 times	32.716 us/op	42.670 us/op	0.77
wrappedArray get 100000 times	32.737 us/op	36.010 us/op	0.91
arrayWithProxy get 100000 times	13.339 ms/op	14.329 ms/op	0.93
ssz.Root.equals	46.922 ns/op	70.926 ns/op	0.66
byteArrayEquals	46.297 ns/op	50.612 ns/op	0.91
Buffer.compare	10.656 ns/op	11.719 ns/op	0.91
processSlot - 1 slots	10.621 us/op	17.382 us/op	0.61
processSlot - 32 slots	1.9710 ms/op	2.8196 ms/op	0.70
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei	61.125 ms/op	58.595 ms/op	1.04
getCommitteeAssignments - req 1 vs - 250000 vc	2.1476 ms/op	2.1315 ms/op	1.01
getCommitteeAssignments - req 100 vs - 250000 vc	4.2090 ms/op	4.2052 ms/op	1.00
getCommitteeAssignments - req 1000 vs - 250000 vc	4.4986 ms/op	4.4952 ms/op	1.00
findModifiedValidators - 10000 modified validators	807.64 ms/op	919.14 ms/op	0.88
findModifiedValidators - 1000 modified validators	786.24 ms/op	796.03 ms/op	0.99
findModifiedValidators - 100 modified validators	254.62 ms/op	280.73 ms/op	0.91
findModifiedValidators - 10 modified validators	154.72 ms/op	172.14 ms/op	0.90
findModifiedValidators - 1 modified validators	166.37 ms/op	251.63 ms/op	0.66
findModifiedValidators - no difference	219.84 ms/op	228.18 ms/op	0.96
compare ViewDUs	6.4080 s/op	6.3326 s/op	1.01
compare each validator Uint8Array	1.4875 s/op	1.5666 s/op	0.95
compare ViewDU to Uint8Array	1.0505 s/op	1.0515 s/op	1.00
migrate state 1000000 validators, 24 modified, 0 new	950.87 ms/op	882.93 ms/op	1.08
migrate state 1000000 validators, 1700 modified, 1000 new	1.3007 s/op	1.2166 s/op	1.07
migrate state 1000000 validators, 3400 modified, 2000 new	1.5808 s/op	1.7488 s/op	0.90
migrate state 1500000 validators, 24 modified, 0 new	1.0626 s/op	1.0521 s/op	1.01
migrate state 1500000 validators, 1700 modified, 1000 new	1.3191 s/op	1.1006 s/op	1.20
migrate state 1500000 validators, 3400 modified, 2000 new	1.5067 s/op	1.2702 s/op	1.19
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei	4.9600 ns/op	4.2800 ns/op	1.16
state getBlockRootAtSlot - 250000 vs - 7PWei	571.52 ns/op	557.41 ns/op	1.03
naive computeProposerIndex 100000 validators	58.553 ms/op
computeProposerIndex 100000 validators	10.086 ms/op
naiveGetNextSyncCommitteeIndices 1000 validators	8.7471 s/op
getNextSyncCommitteeIndices 1000 validators	326.05 ms/op
naiveGetNextSyncCommitteeIndices 10000 validators	9.1964 s/op
getNextSyncCommitteeIndices 10000 validators	285.58 ms/op
naiveGetNextSyncCommitteeIndices 100000 validators	8.2897 s/op
getNextSyncCommitteeIndices 100000 validators	271.88 ms/op
naive computeShuffledIndex 100000 validators	26.569 s/op
cached computeShuffledIndex 100000 validators	548.03 ms/op
naive computeShuffledIndex 2000000 validators	566.07 s/op
cached computeShuffledIndex 2000000 validators	77.671 s/op
computeProposers - vc 250000	25.174 ms/op	8.5687 ms/op	2.94
computeEpochShuffling - vc 250000	63.779 ms/op	42.165 ms/op	1.51
getNextSyncCommittee - vc 250000	340.09 ms/op	149.30 ms/op	2.28
computeSigningRoot for AttestationData	59.942 us/op	21.638 us/op	2.77
hash AttestationData serialized data then Buffer.toString(base64)	2.1485 us/op	1.5852 us/op	1.36
toHexString serialized data	2.2902 us/op	1.1423 us/op	2.00
Buffer.toString(base64)	216.77 ns/op	167.98 ns/op	1.29
nodejs block root to RootHex using toHex	183.56 ns/op	145.00 ns/op	1.27
nodejs block root to RootHex using toRootHex	103.11 ns/op	87.170 ns/op	1.18
browser block root to RootHex using the deprecated toHexString	304.92 ns/op	233.88 ns/op	1.30
browser block root to RootHex using toHex	236.51 ns/op	179.95 ns/op	1.31
browser block root to RootHex using toRootHex	224.92 ns/op	159.96 ns/op	1.41

by benchmarkbot/action

twoeths · 2025-02-12T13:20:53Z

posting benchmark from my personal ubuntu node which is the same to lg/mainnet nodes:

computeProposerIndex
    ✔ naive computeProposerIndex 100000 validators                        15.96156 ops/s    62.65051 ms/op        -          9 runs   62.6 s
    ✔ computeProposerIndex 100000 validators                              93.41964 ops/s    10.70439 ms/op        -         10 runs   11.8 s

  getNextSyncCommitteeIndices electra
    ✔ naiveGetNextSyncCommitteeIndices 1000 validators                   0.1083408 ops/s    9.230135  s/op        -          6 runs   64.5 s
    ✔ getNextSyncCommitteeIndices 1000 validators                         141.7483 ops/s    7.054757 ms/op        -         31 runs  0.726 s
    ✔ naiveGetNextSyncCommitteeIndices 10000 validators                  0.1075031 ops/s    9.302056  s/op        -          6 runs   65.1 s
    ✔ getNextSyncCommitteeIndices 10000 validators                        141.9954 ops/s    7.042481 ms/op        -         31 runs  0.723 s
    ✔ naiveGetNextSyncCommitteeIndices 100000 validators                 0.1068357 ops/s    9.360169  s/op        -          6 runs   65.5 s
    ✔ getNextSyncCommitteeIndices 100000 validators                       142.0709 ops/s    7.038738 ms/op        -         31 runs  0.722 s

  computeShuffledIndex
    ✔ naive computeShuffledIndex 100000 validators                      0.03294994 ops/s    30.34907  s/op        -          2 runs   90.1 s
    ✔ cached computeShuffledIndex 100000 validators                       1.535617 ops/s    651.2041 ms/op        -         10 runs   7.17 s
    ✔ naive computeShuffledIndex 2000000 validators                    0.001597753 ops/s    625.8789  s/op        -          1 runs 1.25e+3 s
    ✔ cached computeShuffledIndex 2000000 validators                    0.02168103 ops/s    46.12328  s/op        -          1 runs   92.5 s

it also shows >1000x improvement for some tests, similar to my local environment

wemeetagain

LGTM, appreciate the thorough analysis

wemeetagain

Added a few small things, lgtm

wemeetagain · 2025-02-12T18:45:45Z

packages/state-transition/src/util/seed.ts

+
+    let i = 0;
+    let cachedHash: Uint8Array | null = null;
+    const cachedHashInput = Buffer.allocUnsafe(32 + 8);


added: reuse this buffer as the input for digest below

wemeetagain · 2025-02-12T18:46:41Z

packages/state-transition/src/util/seed.ts

@@ -120,6 +120,8 @@ export function computeProposerIndex(
    const shuffledResult = new Map<number, number>();

    let i = 0;
+    const cachedHashInput = Buffer.allocUnsafe(32 + 8);


added: reuse this buffer as the input for digest below

wemeetagain · 2025-02-12T18:46:55Z

packages/state-transition/src/util/seed.ts

@@ -402,7 +408,7 @@ export function getComputeShuffledIndexFn(indexCount: number, seed: Bytes32): Co
        //   bytesToBigInt(digest(Buffer.concat([_seed, intToBytes(i, 1)])).slice(0, 8)) % BigInt(indexCount)
        // );
        pivotBuffer[32] = i % 256;
-        pivot = Number(bytesToBigInt(digest(pivotBuffer).slice(0, 8)) % BigInt(indexCount));
+        pivot = Number(bytesToBigInt(digest(pivotBuffer).subarray(0, 8)) % BigInt(indexCount));


added: use subarray instead of slice

wemeetagain · 2025-02-13T15:47:19Z

🎉 This PR is included in v1.27.0 🎉

dapplion · 2025-02-16T16:36:19Z

GG 👏

twoeths added 2 commits February 12, 2025 15:11

fix: improve getNextSyncCommitteeIndices for electra

274f651

fix: also improve computeProposerIndex()

0f9097a

twoeths added 3 commits February 12, 2025 17:41

chore: rename variables and tweak comment

91bad18

chore: more benchmark for computeShuffledIndex

8884972

fix: do not allocate Buffer every time

fcbb118

wemeetagain previously approved these changes Feb 12, 2025

View reviewed changes

chore: more optimizations

6653985

wemeetagain dismissed their stale review via 6653985 February 12, 2025 18:42

wemeetagain previously approved these changes Feb 12, 2025

View reviewed changes

chore: zero upper 4 bytes in prealloc'd buffers

8713832

wemeetagain dismissed their stale review via 8713832 February 12, 2025 19:08

wemeetagain approved these changes Feb 12, 2025

View reviewed changes

wemeetagain marked this pull request as ready for review February 12, 2025 20:45

wemeetagain requested a review from a team as a code owner February 12, 2025 20:46

wemeetagain merged commit 85b13c1 into unstable Feb 12, 2025
20 checks passed

wemeetagain deleted the te/improve_sync_committee_updates branch February 12, 2025 20:46

wemeetagain mentioned this pull request Feb 12, 2025

feat: add compute shuffled index ChainSafe/swap-or-not-shuffle#5

Merged

2 tasks

This comment was marked as spam.

Sign in to view

wemeetagain mentioned this pull request Feb 25, 2025

chore: use native compute proposer/sync committee #7499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve sync committee updates #7456

fix: improve sync committee updates #7456

twoeths commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 12, 2025 •

edited

Loading

github-actions bot commented Feb 12, 2025 •

edited

Loading

twoeths commented Feb 12, 2025

wemeetagain left a comment

wemeetagain left a comment

wemeetagain Feb 12, 2025

wemeetagain Feb 12, 2025

wemeetagain Feb 12, 2025

wemeetagain commented Feb 13, 2025

dapplion commented Feb 16, 2025

This comment was marked as spam.

fix: improve sync committee updates #7456

fix: improve sync committee updates #7456

Conversation

twoeths commented Feb 12, 2025 • edited Loading

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

github-actions bot commented Feb 12, 2025 • edited Loading

Performance Report

twoeths commented Feb 12, 2025

wemeetagain left a comment

Choose a reason for hiding this comment

wemeetagain left a comment

Choose a reason for hiding this comment

wemeetagain Feb 12, 2025

Choose a reason for hiding this comment

wemeetagain Feb 12, 2025

Choose a reason for hiding this comment

wemeetagain Feb 12, 2025

Choose a reason for hiding this comment

wemeetagain commented Feb 13, 2025

dapplion commented Feb 16, 2025

This comment was marked as spam.

twoeths commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 12, 2025 •

edited

Loading

github-actions bot commented Feb 12, 2025 •

edited

Loading