-
Notifications
You must be signed in to change notification settings - Fork 974
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
Showing
1 changed file
with
184 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
# ADR #003: March 2022 Testnet Celestia Node | ||
|
||
<hr style="border:3px solid gray"> </hr> | ||
|
||
## Authors | ||
|
||
@renaynay @Wondertan | ||
|
||
## Changelog | ||
|
||
* 2021-11-25: initial draft | ||
|
||
<hr style="border:2px solid gray"> </hr> | ||
|
||
## Legend | ||
|
||
### Celestia DA Network | ||
|
||
Refers to the data availability "halo" network created around the Core network. | ||
|
||
### **Bridge Node** | ||
|
||
A **bridge** node is a **full** node that is connected to a Celestia Core node via RPC. It receives either a remote | ||
address from a running Core node or it can run a Core node as an embedded process, but the critical difference is that | ||
instead of reconstructing blocks via downloading enough shares from the network, it receives headers and blocks directly from its | ||
trusted Core node, validating blocks, erasure coding them, and producing `ExtendedHeader`s to broadcast to the Celestia | ||
DA network. | ||
|
||
### **Full Node** | ||
|
||
A **full** node is the same thing as a **light** node, but instead of performing `LightAvailability` (the process of | ||
DASing to verify a header is legitimate), it performs `FullAvailability` which downloads enough shares from the network in order | ||
to fully reconstruct the block and store it, serving shares to the rest of the network. | ||
|
||
### **Light Node** | ||
|
||
A **light** node listens for `ExtendedHeader`s from the DA network and performs DAS on the received headers. | ||
|
||
<hr style="border:2px solid gray"> </hr> | ||
|
||
## Context | ||
|
||
This ADR describes a design for the March 2022 Celestia Testnet that we decided at the Berlin 2021 offsite. Now that | ||
we have a basic scaffolding and structure for a celestia node, the focus of the next engineering sprint is to continue | ||
refactoring and improving this structure to include more features (defined later in this document). | ||
|
||
<hr style="border:2px solid gray"> </hr> | ||
|
||
## Decision | ||
|
||
## New Features | ||
|
||
### [New node type definitions](https://github.com/celestiaorg/celestia-node/issues/250) | ||
* Introduce a standalone **full** node and rename current full node implementation to **bridge** node. | ||
* Remove **dev** as a node type and make it a flag on every available node type. | ||
|
||
### Introduce bad encoding fraud proofs | ||
Bad encoding fraud proofs will be generated by **full** nodes inside of `ShareService`, upon reconstructing a block | ||
via the sampling process. | ||
|
||
If fraud is detected, the **full** node will generate the proof and broadcast it to the `FraudSub` gossip network and | ||
will subsequently halt all operations. If no fraud is detected, the **full** node will continue operations without | ||
propagating any messages to the network. Since **full** nodes reconstruct every block, they do not have to listen to | ||
`FraudSub` as they perform the necessary encoding checks on every block. | ||
|
||
**Light** nodes, however, will listen to `FraudSub` for bad encoding fraud proofs. **Light** nodes will verify the | ||
fraud proofs against the relevant header hash to ensure that the fraud proof is valid. | ||
If the fraud proof is valid, the node should immediately halt all operations. If it is invalid, the node proceeds | ||
operations as usual. | ||
|
||
Eventually, we may choose to use the reputation tracking system provided by [gossipsub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md#peer-scoring) for nodes who broadcast invalid fraud | ||
proofs to the network, but that is not a requirement for this iteration. | ||
|
||
### [Introduce an RPC structure and some basic APIs](https://github.com/celestiaorg/celestia-node/issues/169) | ||
Implement scaffolding for RPC on all node types, such that a user can access the following methods: | ||
|
||
`HeaderAPI` | ||
|
||
* `Header(_height_)` -> ExtendedHeader{} | ||
* `Header(_hash_)` -> ExtendedHeader{} | ||
|
||
`NodeAPI` | ||
|
||
* `P2PInfo()` -> returns a blob of p2p info (can be broken into several subcommands, such as `net_info`) | ||
* `Config()` -> returns the node's config | ||
* `NodeType()` -> returns the node's type (e.g. **full** | **bridge** | **light** ) | ||
* `RPCInfo()` -> RPC port, version, available APIs, etc. | ||
|
||
`UserAPI` | ||
|
||
* `AccountBalance(_acct_)` -> returns balance for given account | ||
* `SubmitTx(_txdata_)` -> submits a transaction to the network | ||
|
||
*Note: it is likely more methods will be added, but the above listed are the essential ones for this iteration.* | ||
|
||
### Introduce `StateService` | ||
`StateService` is responsible for fetching state relevant to a user being able to submit a transaction, such as account | ||
balance, preparing the transaction, and propagating it via `TxSub`. **Bridge** nodes will be responsible for listening | ||
to `TxSub` and relaying the transactions into the Core mempool. **Light** and **full** nodes will be able to publish | ||
transactions to `TxSub`, but do not need to listen for them. | ||
|
||
Celestia-node's state interaction will be detailed further in a subsequent ADR. | ||
|
||
### [Data Availability Sampling during `HeaderSync`](https://github.com/celestiaorg/celestia-node/issues/181) | ||
|
||
Currently, both **light** and **full* nodes are unable to perform data availability sampling (DAS) while syncing. | ||
They only begin sampling once the node is synced up to head of chain. | ||
|
||
`HeaderSync` and the `DASer` will be refactored such that the `DASer` will be able to perform sampling on past headers | ||
as the node is syncing. A possible approach would be to for the syncing algorithms in both the `DASer` and `HeaderSync` | ||
to align such that headers received during sync will be propagated to the `DASer` for sampling via an internal pubsub. | ||
|
||
The `DASer` will maintain a checkpoint to the last sampled header so that it can continue sampling from the last | ||
checkpoint on any new headers. | ||
|
||
|
||
<hr style="border:1px solid gray"> </hr> | ||
|
||
## Refactoring | ||
|
||
### `HeaderService` becomes main component around which most other services are focused | ||
Initially, we started with BlockService being the more “important” component during devnet architecture, but overlooked | ||
some problems with regards to sync (we initially made the decision that a celestia full node would have to be started | ||
together at the same time as a core node). | ||
|
||
This led us to an issue where eventually we needed to connect to an already-running core node and sync from it. We were | ||
missing a component to do that, so we implemented `HeaderExchange` over the core client (wrapping another interface we | ||
had previously created for `BlockService` called `BlockFetcher`), and we had to do this last minute because it wouldn’t | ||
work otherwise, leading to last-minute solutions, like having to hand both the celestia **light** and **full** node a | ||
“trusted” hash of a header from the already-running chain so that it can sync from that point and start listening for | ||
new headers. | ||
|
||
#### Proposed new architecture: [`BlockService` is only responsible for reconstructing the block from Shares handed to it by the `ShareService`](https://github.com/celestiaorg/celestia-node/issues/251). | ||
Right now, the `BlockService` is in charge of fetching new blocks from the core node, erasure coding them, generating | ||
DAH, generating `ExtendedHeader`, broadcasting `ExtendedHeader` to `HeaderSub` network, and storing the block data | ||
(after some validation checks). | ||
|
||
Instead, a **full** node will rely on `ShareService` sampling to fetch us *enough* shares to reconstruct the block | ||
inside of `BlockService`. Contrastingly, a **bridge** node will not do block reconstruction via sampling, but rather | ||
rely on the `header.CoreSubscriber` implementation of `header.Subscriber` for blocks. `header.CoreSubscriber` will | ||
handle listening for new block events from the core node via RPC, erasure code the new block, generate the | ||
`ExtendedHeader` and pipe the erasure coded block through to `BlockService` via an internal subscription. | ||
|
||
### `HeaderSync` optimizations | ||
* Implement disconnect toleration | ||
|
||
### Unbonding period handling | ||
The **light** and **full** nodes currently are prone to long-range attacks. To mitigate it, we should | ||
introduce an additional `trustPeriod` variable (equal to unbonding period) which applies to headers. Suppose a node | ||
starts with the period between subjective head and objective head being higher than the unbonding period - | ||
in that case, the **light** node must not trust the subjective head anymore, specifically its `ValidatorSet`. Therefore, | ||
instead of syncing subsequent headers on top of the untrusted subjective head, the node should request a new objective | ||
head from the `trustedPeer` and set it as a new trusted subjective head. This approach will follow the Tendermint model | ||
for | ||
[light client attack detection](https://github.com/tendermint/spec/blob/master/spec/light-client/detection/detection_003_reviewed.md#light-client-attack-detector). | ||
|
||
<hr style="border:1px solid gray"> </hr> | ||
|
||
## Nice to have | ||
|
||
### `ShareService` optimizations | ||
* Implement parallelization for retrieving shares by namespace. This | ||
[issue](https://github.com/celestiaorg/celestia-node/issues/184) is already being worked on. | ||
* NMT/Shares/Namespace storage optimizations: | ||
* Right now we prepend to each Share 17 additional bytes, Luckily, for each reason why the prepended bytes were added, | ||
there is an alternative solution: It is possible to get NMT Node type indirectly, without serializing the type itself | ||
by looking at the amount of links. To recover the namespace of the erasured data, we should not encode namespaces into | ||
the data itself. It is possible to get the namespace for each share encoded in inner non-leaf nodes of the NMT tree. | ||
* Pruning for shares. | ||
|
||
|
||
### [Move IPLD from celetia-node repo into its own repo](https://github.com/celestiaorg/celestia-node/issues/111) | ||
Since the IPLD package is pretty much entirely separate from the celestia-node implementation, it makes sense that it | ||
is removed from the celestia-node repository and maintained separately. The extraction of IPLD should also include a | ||
review and refactoring as there are still some legacy components that are either no longer necessary and the | ||
documentation also needs updating. | ||
|
||
### Implement additional light node verification logic similar to the Tendermint Light Client Model | ||
At the moment, the syncing logic for a **light** nodes is simple in that it syncs each header from a single peer. | ||
Instead, the **light** node should double-check headers with another randomly chosen | ||
["witness"](https://github.com/tendermint/tendermint/blob/02d456b8b8274088e8d3c6e1714263a47ffe13ac/light/client.go#L154-L161) | ||
peer than the primary peer from which it received the header, as described in the | ||
[light client attack detector](https://github.com/tendermint/spec/blob/master/spec/light-client/detection/detection_003_reviewed.md#light-client-attack-detector) | ||
model from Tendermint. |