Skip to content

Commit 85b13c1

Browse files
twoethswemeetagain
andauthored
fix: improve sync committee updates (#7456)
**Motivation** - from electra, `processSyncCommitteeUpdates()` could be >15s according to the devnet **Description** - the main fix is in `computeShuffledIndex` where we can cache pivot and source computation there - some other optimization other than that: - only compute hash once every 16 iterations - compute int manually instead of using `bytesToInt` in order not to use BigInt - cache shuffled index I guess if we use `hashtree` we can improve more but the diff is a lot already and the main optimization is in `computeShuffledIndex()`, not the hash function. We can consider that in the future. We can also improve pre-electra but I think it's been not that bad for a long time, so only focus on electra in this PR Closes #7366 **Tests** - added unit tests to compare naive version vs the optimized version - benchmarks on local show >1000x difference for the main concerned function `naiveGetNextSyncCommitteeIndices()` while CI only show >20x difference. This is my local ``` computeProposerIndex ✔ naive computeProposerIndex 100000 validators 31.86491 ops/s 31.38248 ms/op - 10 runs 34.5 s ✔ computeProposerIndex 100000 validators 106.2267 ops/s 9.413833 ms/op - 10 runs 10.4 s getNextSyncCommitteeIndices electra ✔ naiveGetNextSyncCommitteeIndices 1000 validators 0.2121840 ops/s 4.712890 s/op - 10 runs 51.7 s ✔ getNextSyncCommitteeIndices 1000 validators 214.9251 ops/s 4.652783 ms/op - 45 runs 0.714 s ✔ naiveGetNextSyncCommitteeIndices 10000 validators 0.2122278 ops/s 4.711918 s/op - 10 runs 51.8 s ✔ getNextSyncCommitteeIndices 10000 validators 220.2337 ops/s 4.540632 ms/op - 46 runs 0.710 s ✔ naiveGetNextSyncCommitteeIndices 100000 validators 0.2117828 ops/s 4.721820 s/op - 10 runs 52.2 s ✔ getNextSyncCommitteeIndices 100000 validators 204.7383 ops/s 4.884283 ms/op - 43 runs 0.714 s computeShuffledIndex ✔ naive computeShuffledIndex 100000 validators 0.06638498 ops/s 15.06365 s/op - 3 runs 60.3 s ✔ cached computeShuffledIndex 100000 validators 1.932706 ops/s 517.4092 ms/op - 10 runs 5.72 s ``` --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Cayman <caymannava@gmail.com>
1 parent e45e0eb commit 85b13c1

File tree

3 files changed

+438
-5
lines changed

3 files changed

+438
-5
lines changed

packages/state-transition/src/util/seed.ts

+257-3
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ export function computeProposers(
3535
fork,
3636
effectiveBalanceIncrements,
3737
shuffling.activeIndices,
38+
// TODO: if we use hashTree, we can precompute the roots for the next n loops
3839
digest(Buffer.concat([epochSeed, intToBytes(slot, 8)]))
3940
)
4041
);
@@ -44,10 +45,11 @@ export function computeProposers(
4445

4546
/**
4647
* Return from ``indices`` a random index sampled by effective balance.
48+
* This is just to make sure lodestar follows the spec, this is not for production.
4749
*
4850
* SLOW CODE - 🐢
4951
*/
50-
export function computeProposerIndex(
52+
export function naiveComputeProposerIndex(
5153
fork: ForkSeq,
5254
effectiveBalanceIncrements: EffectiveBalanceIncrements,
5355
indices: ArrayLike<ValidatorIndex>,
@@ -95,7 +97,93 @@ export function computeProposerIndex(
9597
}
9698

9799
/**
98-
* TODO: NAIVE
100+
* Optimized version of `naiveComputeProposerIndex`.
101+
* It shows > 3x speedup according to the perf test.
102+
*/
103+
export function computeProposerIndex(
104+
fork: ForkSeq,
105+
effectiveBalanceIncrements: EffectiveBalanceIncrements,
106+
indices: ArrayLike<ValidatorIndex>,
107+
seed: Uint8Array
108+
): ValidatorIndex {
109+
if (indices.length === 0) {
110+
throw Error("Validator indices must not be empty");
111+
}
112+
113+
if (fork >= ForkSeq.electra) {
114+
// electra, see inline comments for the optimization
115+
const MAX_RANDOM_VALUE = 2 ** 16 - 1;
116+
const MAX_EFFECTIVE_BALANCE_INCREMENT = MAX_EFFECTIVE_BALANCE_ELECTRA / EFFECTIVE_BALANCE_INCREMENT;
117+
118+
const shuffledIndexFn = getComputeShuffledIndexFn(indices.length, seed);
119+
// this simple cache makes sure we don't have to recompute the shuffled index for the next round of activeValidatorCount
120+
const shuffledResult = new Map<number, number>();
121+
122+
let i = 0;
123+
const cachedHashInput = Buffer.allocUnsafe(32 + 8);
124+
cachedHashInput.set(seed, 0);
125+
cachedHashInput.writeUint32LE(0, 32 + 4);
126+
let cachedHash: Uint8Array | null = null;
127+
while (true) {
128+
// an optimized version of the below naive code
129+
// const candidateIndex = indices[computeShuffledIndex(i % indices.length, indices.length, seed)];
130+
const index = i % indices.length;
131+
let shuffledIndex = shuffledResult.get(index);
132+
if (shuffledIndex == null) {
133+
shuffledIndex = shuffledIndexFn(index);
134+
shuffledResult.set(index, shuffledIndex);
135+
}
136+
const candidateIndex = indices[shuffledIndex];
137+
138+
// compute a new hash every 16 iterations
139+
if (i % 16 === 0) {
140+
cachedHashInput.writeUint32LE(Math.floor(i / 16), 32);
141+
cachedHash = digest(cachedHashInput);
142+
}
143+
144+
if (cachedHash == null) {
145+
// there is always a cachedHash, handle this to make the compiler happy
146+
throw new Error("cachedHash should not be null");
147+
}
148+
149+
const randomBytes = cachedHash;
150+
const offset = (i % 16) * 2;
151+
// this is equivalent to bytesToInt(randomBytes.subarray(offset, offset + 2));
152+
// but it does not get through BigInt
153+
const lowByte = randomBytes[offset];
154+
const highByte = randomBytes[offset + 1];
155+
const randomValue = lowByte + highByte * 256;
156+
157+
const effectiveBalanceIncrement = effectiveBalanceIncrements[candidateIndex];
158+
if (effectiveBalanceIncrement * MAX_RANDOM_VALUE >= MAX_EFFECTIVE_BALANCE_INCREMENT * randomValue) {
159+
return candidateIndex;
160+
}
161+
162+
i += 1;
163+
}
164+
} else {
165+
// preelectra, this function is the same to the naive version
166+
const MAX_RANDOM_BYTE = 2 ** 8 - 1;
167+
const MAX_EFFECTIVE_BALANCE_INCREMENT = MAX_EFFECTIVE_BALANCE / EFFECTIVE_BALANCE_INCREMENT;
168+
169+
let i = 0;
170+
while (true) {
171+
const candidateIndex = indices[computeShuffledIndex(i % indices.length, indices.length, seed)];
172+
const randomByte = digest(Buffer.concat([seed, intToBytes(Math.floor(i / 32), 8, "le")]))[i % 32];
173+
174+
const effectiveBalanceIncrement = effectiveBalanceIncrements[candidateIndex];
175+
if (effectiveBalanceIncrement * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE_INCREMENT * randomByte) {
176+
return candidateIndex;
177+
}
178+
179+
i += 1;
180+
}
181+
}
182+
}
183+
184+
/**
185+
* Naive version, this is not supposed to be used in production.
186+
* See `computeProposerIndex` for the optimized version.
99187
*
100188
* Return the sync committee indices for a given state and epoch.
101189
* Aligns `epoch` to `baseEpoch` so the result is the same with any `epoch` within a sync period.
@@ -104,7 +192,7 @@ export function computeProposerIndex(
104192
*
105193
* SLOW CODE - 🐢
106194
*/
107-
export function getNextSyncCommitteeIndices(
195+
export function naiveGetNextSyncCommitteeIndices(
108196
fork: ForkSeq,
109197
state: BeaconStateAllForks,
110198
activeValidatorIndices: ArrayLike<ValidatorIndex>,
@@ -161,13 +249,110 @@ export function getNextSyncCommitteeIndices(
161249
return syncCommitteeIndices;
162250
}
163251

252+
/**
253+
* Optmized version of `naiveGetNextSyncCommitteeIndices`.
254+
*
255+
* In the worse case scenario, this could be >1000x speedup according to the perf test.
256+
*/
257+
export function getNextSyncCommitteeIndices(
258+
fork: ForkSeq,
259+
state: BeaconStateAllForks,
260+
activeValidatorIndices: ArrayLike<ValidatorIndex>,
261+
effectiveBalanceIncrements: EffectiveBalanceIncrements
262+
): ValidatorIndex[] {
263+
const syncCommitteeIndices = [];
264+
265+
if (fork >= ForkSeq.electra) {
266+
// electra, see inline comments for the optimization
267+
const MAX_RANDOM_VALUE = 2 ** 16 - 1;
268+
const MAX_EFFECTIVE_BALANCE_INCREMENT = MAX_EFFECTIVE_BALANCE_ELECTRA / EFFECTIVE_BALANCE_INCREMENT;
269+
270+
const epoch = computeEpochAtSlot(state.slot) + 1;
271+
const activeValidatorCount = activeValidatorIndices.length;
272+
const seed = getSeed(state, epoch, DOMAIN_SYNC_COMMITTEE);
273+
const shuffledIndexFn = getComputeShuffledIndexFn(activeValidatorCount, seed);
274+
275+
let i = 0;
276+
let cachedHash: Uint8Array | null = null;
277+
const cachedHashInput = Buffer.allocUnsafe(32 + 8);
278+
cachedHashInput.set(seed, 0);
279+
cachedHashInput.writeUInt32LE(0, 32 + 4);
280+
// this simple cache makes sure we don't have to recompute the shuffled index for the next round of activeValidatorCount
281+
const shuffledResult = new Map<number, number>();
282+
while (syncCommitteeIndices.length < SYNC_COMMITTEE_SIZE) {
283+
// optimized version of the below naive code
284+
// const shuffledIndex = shuffledIndexFn(i % activeValidatorCount);
285+
const index = i % activeValidatorCount;
286+
let shuffledIndex = shuffledResult.get(index);
287+
if (shuffledIndex == null) {
288+
shuffledIndex = shuffledIndexFn(index);
289+
shuffledResult.set(index, shuffledIndex);
290+
}
291+
const candidateIndex = activeValidatorIndices[shuffledIndex];
292+
293+
// compute a new hash every 16 iterations
294+
if (i % 16 === 0) {
295+
cachedHashInput.writeUint32LE(Math.floor(i / 16), 32);
296+
cachedHash = digest(cachedHashInput);
297+
}
298+
299+
if (cachedHash == null) {
300+
// there is always a cachedHash, handle this to make the compiler happy
301+
throw new Error("cachedHash should not be null");
302+
}
303+
304+
const randomBytes = cachedHash;
305+
const offset = (i % 16) * 2;
306+
307+
// this is equivalent to bytesToInt(randomBytes.subarray(offset, offset + 2));
308+
// but it does not get through BigInt
309+
const lowByte = randomBytes[offset];
310+
const highByte = randomBytes[offset + 1];
311+
const randomValue = lowByte + highByte * 256;
312+
313+
const effectiveBalanceIncrement = effectiveBalanceIncrements[candidateIndex];
314+
if (effectiveBalanceIncrement * MAX_RANDOM_VALUE >= MAX_EFFECTIVE_BALANCE_INCREMENT * randomValue) {
315+
syncCommitteeIndices.push(candidateIndex);
316+
}
317+
318+
i += 1;
319+
}
320+
} else {
321+
// pre-electra, keep the same naive version
322+
const MAX_RANDOM_BYTE = 2 ** 8 - 1;
323+
const MAX_EFFECTIVE_BALANCE_INCREMENT = MAX_EFFECTIVE_BALANCE / EFFECTIVE_BALANCE_INCREMENT;
324+
325+
const epoch = computeEpochAtSlot(state.slot) + 1;
326+
const activeValidatorCount = activeValidatorIndices.length;
327+
const seed = getSeed(state, epoch, DOMAIN_SYNC_COMMITTEE);
328+
329+
let i = 0;
330+
while (syncCommitteeIndices.length < SYNC_COMMITTEE_SIZE) {
331+
const shuffledIndex = computeShuffledIndex(i % activeValidatorCount, activeValidatorCount, seed);
332+
const candidateIndex = activeValidatorIndices[shuffledIndex];
333+
const randomByte = digest(Buffer.concat([seed, intToBytes(Math.floor(i / 32), 8, "le")]))[i % 32];
334+
335+
const effectiveBalanceIncrement = effectiveBalanceIncrements[candidateIndex];
336+
if (effectiveBalanceIncrement * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE_INCREMENT * randomByte) {
337+
syncCommitteeIndices.push(candidateIndex);
338+
}
339+
340+
i += 1;
341+
}
342+
}
343+
344+
return syncCommitteeIndices;
345+
}
346+
164347
/**
165348
* Return the shuffled validator index corresponding to ``seed`` (and ``index_count``).
166349
*
167350
* Swap or not
168351
* https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf
169352
*
170353
* See the 'generalized domain' algorithm on page 3.
354+
* This is the naive implementation just to make sure lodestar follows the spec, this is not for production.
355+
* The optimized version is in `getComputeShuffledIndexFn`.
171356
*/
172357
export function computeShuffledIndex(index: number, indexCount: number, seed: Bytes32): number {
173358
let permuted = index;
@@ -188,6 +373,75 @@ export function computeShuffledIndex(index: number, indexCount: number, seed: By
188373
return permuted;
189374
}
190375

376+
type ComputeShuffledIndexFn = (index: number) => number;
377+
378+
/**
379+
* An optimized version of `computeShuffledIndex`, this is for production.
380+
*/
381+
export function getComputeShuffledIndexFn(indexCount: number, seed: Bytes32): ComputeShuffledIndexFn {
382+
// there are possibly SHUFFLE_ROUND_COUNT (90 for mainnet) values for this cache
383+
// this cache will always hit after the 1st call
384+
const pivotByIndex: Map<number, number> = new Map();
385+
// given 2M active validators, there are 2 M / 256 = 8k possible positionDiv
386+
// it means there are at most 8k different sources for each round
387+
const sourceByPositionDivByIndex: Map<number, Map<number, Uint8Array>> = new Map();
388+
// 32 bytes seed + 1 byte i
389+
const pivotBuffer = Buffer.alloc(32 + 1);
390+
pivotBuffer.set(seed, 0);
391+
// 32 bytes seed + 1 byte i + 4 bytes positionDiv
392+
const sourceBuffer = Buffer.alloc(32 + 1 + 4);
393+
sourceBuffer.set(seed, 0);
394+
395+
return (index): number => {
396+
assert.lt(index, indexCount, "indexCount must be less than index");
397+
assert.lte(indexCount, 2 ** 40, "indexCount too big");
398+
let permuted = index;
399+
const _seed = seed;
400+
for (let i = 0; i < SHUFFLE_ROUND_COUNT; i++) {
401+
// optimized version of the below naive code
402+
// const pivot = Number(
403+
// bytesToBigInt(digest(Buffer.concat([_seed, intToBytes(i, 1)])).slice(0, 8)) % BigInt(indexCount)
404+
// );
405+
406+
let pivot = pivotByIndex.get(i);
407+
if (pivot == null) {
408+
// naive version always creates a new buffer, we can reuse the buffer
409+
// pivot = Number(
410+
// bytesToBigInt(digest(Buffer.concat([_seed, intToBytes(i, 1)])).slice(0, 8)) % BigInt(indexCount)
411+
// );
412+
pivotBuffer[32] = i % 256;
413+
pivot = Number(bytesToBigInt(digest(pivotBuffer).subarray(0, 8)) % BigInt(indexCount));
414+
pivotByIndex.set(i, pivot);
415+
}
416+
417+
const flip = (pivot + indexCount - permuted) % indexCount;
418+
const position = Math.max(permuted, flip);
419+
420+
// optimized version of the below naive code
421+
// const source = digest(Buffer.concat([_seed, intToBytes(i, 1), intToBytes(Math.floor(position / 256), 4)]));
422+
let sourceByPositionDiv = sourceByPositionDivByIndex.get(i);
423+
if (sourceByPositionDiv == null) {
424+
sourceByPositionDiv = new Map<number, Uint8Array>();
425+
sourceByPositionDivByIndex.set(i, sourceByPositionDiv);
426+
}
427+
const positionDiv256 = Math.floor(position / 256);
428+
let source = sourceByPositionDiv.get(positionDiv256);
429+
if (source == null) {
430+
// naive version always creates a new buffer, we can reuse the buffer
431+
// don't want to go through intToBytes() to avoid BigInt
432+
sourceBuffer[32] = i % 256;
433+
sourceBuffer.writeUint32LE(positionDiv256, 33);
434+
source = digest(sourceBuffer);
435+
sourceByPositionDiv.set(positionDiv256, source);
436+
}
437+
const byte = source[Math.floor((position % 256) / 8)];
438+
const bit = (byte >> (position % 8)) % 2;
439+
permuted = bit ? flip : permuted;
440+
}
441+
return permuted;
442+
};
443+
}
444+
191445
/**
192446
* Return the randao mix at a recent [[epoch]].
193447
*/

0 commit comments

Comments
 (0)