Skip to content

Optimistic Block #10584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
11 of 12 tasks
Tracked by #46
Longarithm opened this issue Feb 7, 2024 · 2 comments
Closed
11 of 12 tasks
Tracked by #46

Optimistic Block #10584

Longarithm opened this issue Feb 7, 2024 · 2 comments
Assignees
Labels
A-stateless-validation Area: stateless validation

Comments

@Longarithm
Copy link
Member

Longarithm commented Feb 7, 2024

Goal

Remove 2x latency of chunk execution by introducing OptimisticBlock.

Plan

  • Create OptimisticBlock
  • Remove block_hash dependency from runtime
  • Change Chain::get_update_shard_job to accept not only Block, but OptimisticBlock and Chunks
  • Implement OptimisticBlockPool
  • Validate OptimisticBlock
  • Distribute OptimisticBlock

Testing (previous steps also include testing)

  • Basic TestLoop test
  • Run forknet
  • Invalid optimistic blocks
  • Optimistic blocks not distributed to everyone
  • Optimistic blocks with missing chunks or wrong chunks included
  • Optimistic block and block arriving together
@Longarithm
Copy link
Member Author

Zulip thread: https://near.zulipchat.com/#narrow/channel/295558-core/topic/Optimistic.20Block.20Design/near/467293949

Original description from 8 Feb 2024

Stateless validation implementation introduces 2x latency of chunk execution. This is because validation of chunk (N+1) is blocked on execution of chunk N, and this validation repeats execution of chunk N itself.

It can be avoided by optimistic execution:

  • Once CP produces chunk N, it can go ahead and distribute it to CP for chunk (N+1). Because they have state, they can execute it before inclusion into block and record resulting state proof.
  • Once BP includes chunk N into block, CP for (N+1) will already have state proof for N ready, can immediately produce chunk (N+1) and send state witness to CVs.

It is a well-defined win after stateless validation release.
For now, I believe we don't need to change config delays because stateless validation improves performance on its own.

Original context: https://docs.google.com/document/d/1k0NRMcLsDZp6C9pCRjNu5l7irDyRHsZ3VtKAKno_tFY/edit#heading=h.7ae0b4dh7648

Another pic of current workflow I came up with trying to understand this:

image

VanBarbascu added a commit to VanBarbascu/nearcore that referenced this issue Jan 17, 2025
This PR introduces the shape of the optimistic block described in near#10584.
Along with it I have added the functions to create and sign it. In the next PR,
I will link this in the path of the block production.
VanBarbascu added a commit to VanBarbascu/nearcore that referenced this issue Jan 17, 2025
…ks` on apply chunk (near#12746)

### Context

We want to improve chunk processing efficiency by applying chunks
optimistically (near#10584), when all partial chunks for the next height and
block metadata (from an `OptimisticBlock`) are already available. This
allows the results of chunk application to be reused when the actual
block is received, enabling the next chunk to be produced immediately.

Currently, this work is a step towards supporting `OptimisticBlock`.
While `OptimisticBlock` is not introduced yet, this refactor prepares
the codebase for its implementation by reducing data dependency on the
current block.

### Change

I replaced the dependency on `Block` with `ApplyChunkBlockContext` and
`chunk_headers: &Chunks`. We'll just need to add conversion from
`OptimisticBlock` to `ApplyChunkBlockContext` later. Chunk headers are
just taken from block, and for optimistic block, they must be supplied
by ShardsManager.

Some APIs are refactored to reflect that change. 

### Next steps

* Convert `OptimisticBlock` to `ApplyChunkBlockContext`.
* Then, we call `get_update_shard_job` for `OptimisticBlock` and reuse
result if it is called for the actual `Block`.

---------

Co-authored-by: Razvan Barbascu <r.barbascu@gmail.com>
VanBarbascu added a commit to VanBarbascu/nearcore that referenced this issue Jan 17, 2025
This PR introduces the shape of the optimistic block described in near#10584.
Along with it I have added the functions to create and sign it. In the next PR,
I will link this in the path of the block production.
VanBarbascu added a commit to VanBarbascu/nearcore that referenced this issue Jan 20, 2025
This PR introduces the shape of the optimistic block described in near#10584.
Along with it I have added the functions to create and sign it. In the next PR,
I will link this in the path of the block production.
VanBarbascu added a commit to VanBarbascu/nearcore that referenced this issue Jan 20, 2025
This PR introduces the shape of the optimistic block described in near#10584.
Along with it I have added the functions to create and sign it. In the next PR,
I will link this in the path of the block production.
github-merge-queue bot pushed a commit that referenced this issue Jan 20, 2025
This PR introduces the shape of the optimistic block described in
#10584.
Along with it I have added the functions to create and sign it. In the
next PR,
I will link this in the path of the block production.
github-merge-queue bot pushed a commit that referenced this issue Jan 23, 2025
We continue the implementation of Optimistic block #10584, by adding the
logic to produce the block as soon as the previous block is done.

If available, the optimistic block will be used in the production of the
block to use the same timestamp.

---------

Co-authored-by: Aleksandr Logunov <the.alex.logunov@gmail.com>
github-merge-queue bot pushed a commit that referenced this issue Jan 24, 2025
…12777)

#10584

There is another unexpected dependency on block hash during chunk
application - it is used in `shuffle_receipt_proofs` to shuffle new
receipts targeting our shard. As block hash is unknown in optimistic
block, I replace it with prev block hash with a protocol upgrade.

Additionally, use `Chunks` instead of `Block` in
`collect_incoming_receipts_from_chunks` - it will be useful for
optimistic block execution flow later.

## Security

Some block producer can brute force hashes to get salt which gives more
desirable order. But block hash is prone to that as well, prev hash has
equivalent safety.

## Upgrade

I use `BlockHeightForReceiptId` feature because it has similar goal and
it is going to be released soon. Adding separate feature makes code
harder to read I think.

## Testing

IMO it makes sense only to check consistency of the shuffling, I don't
see much value in checking that specific salt is used. So I claim that
running existing tests is enough to test that change.
github-merge-queue bot pushed a commit that referenced this issue Feb 5, 2025
Introduce the pool for storing OptimisticBlocks, joining them with
chunks, executing them and reusing cached results. #10584

The change is pretty big; however, I think it's important to merge at
once because it already gives the working example. I'll describe 2 major
changes which are sufficient to review.

### OptimisticBlockChunksPool

Receives optimistic block (OB), currently from block producer itself
only. Receives chunks from ShardsManager. When, on top of some prev
block, OB and chunks are received, allows to take ready OB.

Some primitive throttling and garbage collection is required, to ensure
that OBs are not executed many times and that pool doesn't OOM when
there are forks. For that purpose, we maintain `minimal_base_height` for
chunks and `block_height_threshold` for blocks. Note that we **don't
remove** chunks immediately because if there is a block skip, chunks
should be reused to process the next OB.

This feature is independent, so I also implement simple unit tests for
it.

### Processing OB

As discussed before, result of chunk execution on top of OB doesn't
impact any part of block processing and doesn't persist anything. It is
simply put to cache which can be reused when the actual block is
received.

This cache, however, needs some unique key to store results. For that, I
introduce `CachedShardUpdateKey` which includes necessary fields for
Block or OB, all the chunks and shard id (index could also work). Note
that we need chunk hashes because they define prev outgoing receipts,
which in turn are used to generate incoming receipts for our chunk.

For execution, `BlocksInProcessing` is extended a bit to keep OBs as
well to limit the number of parallel chunk executions for blocks and OBs
together. The population happens in `postprocess_optimistic_block`.

### Testing

Finally, we are also able to write `test_optimistic_block`. For now I
just check that there is at least **one** cache hit, let's think about
more complex cases later.

I'll resolve merge conflicts later.
@Longarithm
Copy link
Member Author

Performance update: https://near.zulipchat.com/#narrow/channel/295558-core/topic/Optimistic.20Block.20Design/near/499464703

300ms block delay + mirror traffic + multi shard synth-bm native transfers traffic - block rate per second goes from 1.66 to 2.21 (2.7 bps with just mirror traffic).

shreyan-gupta pushed a commit to shreyan-gupta/nearcore that referenced this issue Mar 28, 2025
Introduce the pool for storing OptimisticBlocks, joining them with
chunks, executing them and reusing cached results. near#10584

The change is pretty big; however, I think it's important to merge at
once because it already gives the working example. I'll describe 2 major
changes which are sufficient to review.

### OptimisticBlockChunksPool

Receives optimistic block (OB), currently from block producer itself
only. Receives chunks from ShardsManager. When, on top of some prev
block, OB and chunks are received, allows to take ready OB.

Some primitive throttling and garbage collection is required, to ensure
that OBs are not executed many times and that pool doesn't OOM when
there are forks. For that purpose, we maintain `minimal_base_height` for
chunks and `block_height_threshold` for blocks. Note that we **don't
remove** chunks immediately because if there is a block skip, chunks
should be reused to process the next OB.

This feature is independent, so I also implement simple unit tests for
it.

### Processing OB

As discussed before, result of chunk execution on top of OB doesn't
impact any part of block processing and doesn't persist anything. It is
simply put to cache which can be reused when the actual block is
received.

This cache, however, needs some unique key to store results. For that, I
introduce `CachedShardUpdateKey` which includes necessary fields for
Block or OB, all the chunks and shard id (index could also work). Note
that we need chunk hashes because they define prev outgoing receipts,
which in turn are used to generate incoming receipts for our chunk.

For execution, `BlocksInProcessing` is extended a bit to keep OBs as
well to limit the number of parallel chunk executions for blocks and OBs
together. The population happens in `postprocess_optimistic_block`.

### Testing

Finally, we are also able to write `test_optimistic_block`. For now I
just check that there is at least **one** cache hit, let's think about
more complex cases later.

I'll resolve merge conflicts later.
shreyan-gupta pushed a commit to shreyan-gupta/nearcore that referenced this issue Mar 28, 2025
Introduce the pool for storing OptimisticBlocks, joining them with
chunks, executing them and reusing cached results. near#10584

The change is pretty big; however, I think it's important to merge at
once because it already gives the working example. I'll describe 2 major
changes which are sufficient to review.

### OptimisticBlockChunksPool

Receives optimistic block (OB), currently from block producer itself
only. Receives chunks from ShardsManager. When, on top of some prev
block, OB and chunks are received, allows to take ready OB.

Some primitive throttling and garbage collection is required, to ensure
that OBs are not executed many times and that pool doesn't OOM when
there are forks. For that purpose, we maintain `minimal_base_height` for
chunks and `block_height_threshold` for blocks. Note that we **don't
remove** chunks immediately because if there is a block skip, chunks
should be reused to process the next OB.

This feature is independent, so I also implement simple unit tests for
it.

### Processing OB

As discussed before, result of chunk execution on top of OB doesn't
impact any part of block processing and doesn't persist anything. It is
simply put to cache which can be reused when the actual block is
received.

This cache, however, needs some unique key to store results. For that, I
introduce `CachedShardUpdateKey` which includes necessary fields for
Block or OB, all the chunks and shard id (index could also work). Note
that we need chunk hashes because they define prev outgoing receipts,
which in turn are used to generate incoming receipts for our chunk.

For execution, `BlocksInProcessing` is extended a bit to keep OBs as
well to limit the number of parallel chunk executions for blocks and OBs
together. The population happens in `postprocess_optimistic_block`.

### Testing

Finally, we are also able to write `test_optimistic_block`. For now I
just check that there is at least **one** cache hit, let's think about
more complex cases later.

I'll resolve merge conflicts later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

No branches or pull requests

3 participants