Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtime.history.History.Watchblocks to emit sequential blocks (no gaps) #6085

Open
martintomazic opened this issue Feb 27, 2025 · 2 comments

Comments

@martintomazic
Copy link
Contributor

martintomazic commented Feb 27, 2025

Subscription to runtime.client.api.WatchBlocks (see) may return block sequence with missing rounds.

This may happen because runtime.history.History is not notifying blocks if they are Committ-ed via notify=false (during reindex)**.

Note this is only problematic if node !hasLocalStorage, as otherwise storage syncing (worker.storage.committee.Node) is responsible for notifying blocks via StorageSyncCheckpoint. This worker subscribes to roothash.ServiceClient.WatchBlocks, which may have gaps, however it fetches missing rounds.

As discussed in private, roothash.ServiceClient should probably not be responsible for reindexing, nor should roothash.BlockHistory know anything about storage, node types etc... Likely this code needs a thorough refactor.

** We are guaranteed to have at least two reindexes, possibly more if there are errors during subsequent event processing.

@martintomazic martintomazic changed the title Fix runtime.History.History.Watchblocks to emit sequential blocks (no gaps) Fix runtime.history.History.Watchblocks to emit sequential blocks (no gaps) Feb 27, 2025
@martintomazic
Copy link
Contributor Author

As discussed in private, roothash.ServiceClient should probably not be responsible for reindexing, nor should roothash.BlockHistory know anything about storage, node types etc... Likely this code needs a thorough refactor.

Idealy, we move reindexing part from roothash.ServiceClient to something like worker.runtime.xxx.
The worker should be initialized, just like others, after we have initialized RuntimeRegistry. Worker should first fetch runtime.history.History.LastConsensusHeight and reindex until n blocks away from consensus sync. At this point, it should subscribe to roothash.WatchBlocks, do another reindex to fill possible gap and then proceed with regular processing. After that, blocks should be always notified without any gaps.

What about problematic runtime.history.History?

  • runtime.history.History does two things:
    1. it implements roothash.BlockHistory, with pruning policy
    2. it exposes methods that take storage sync into account.
  • We could fix this by either:
    1. removing runtime.history.History interface altogether and reference concrete implementation directly,
    2. split existing block history abstraction into two,
    3. moving methods that take storage sync into account to the new worker.
  • The Problem is that we implement api.RuntimeClient.WatchBlocks, by subscribing to History.WatchBlocks.
    • Whichever solution we will have, we are still hiding away from clients of api.RuntimeClient.WatchBlocks (7 rn), that behavior depends on the underlying node storage type.
    • I prefer explicit parameters && documentation, over hidden flags/state, but then this would mean exposing node/storage type as part of our runtime client API :/?

Alternatively (not ideal), we may push for smtg similar to #6079, or keep it even simpler and use Commit/(Batch) with notify=true, during subsequent reindexes.

I think it would be nice to refactor properly, but also depends on our priorities.

@peternose
Copy link
Contributor

Idealy, we move reindexing part from roothash.ServiceClient to something like worker.runtime.xxx.

Agree. I tried to do that in #6089.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants