celestiaorg · renaynay · Dec 7, 2021 · Nov 25, 2021 · Nov 30, 2021 · Nov 30, 2021
@@ -0,0 +1,159 @@
+# ADR #003: March 2022 Testnet Celestia Node
+
+<hr style="border:3px solid gray"> </hr>
+
+## Authors
+
+@renaynay @Wondertan
+
+## Changelog
+
+* 2021-11-25: initial draft
+
+<hr style="border:2px solid gray"> </hr>
+
+## Legend
+
+### Celestia DA Network
+
+Refers to the data availability "halo" network created around the Core network.
+
+### **Bridge Node**
+
+A **bridge** node is a **full** node that is connected to a Celestia Core node via RPC. It receives either a remote
+address from a running Core node or it can run a Core node as an embedded process, but the critical difference is that
+instead of constructing blocks via sampling the network for shares, it receives headers and blocks directly from its 
+trusted Core node, validating blocks and producing `ExtendedHeader`s to broadcast to the Celestia DA network.
+
+### **Full Node**
+
+A **full** node is the same thing as a **light** node, but instead of performing `LightAvailability` (the process of 
+DASing to verify a header is legitimate), it performs `FullAvailability` which samples the network for shares in order 
+to fully reconstruct the block and store it, serving shares to the rest of the network.
+
+### **Light Node**
+
+A **light** node listens for `ExtendedHeader`s from the DA network and performs DAS on the received headers.
+
+<hr style="border:2px solid gray"> </hr>
+
+## Context
+
+This ADR describes a design for the March 2022 Celestia Testnet that we decided at the Berlin 2021 offsite. Now that
+we have a basic scaffolding and structure for a celestia node, the focus of the next engineering sprint is to continue 
+refactoring and improving this structure to include more features (defined later in this document).
+
+
+<hr style="border:2px solid gray"> </hr>
+
+## Decision
+
+## New Features
+
+### [New node type definitions](https://github.com/celestiaorg/celestia-node/issues/250)
+* Introduce a standalone **full** node and rename current full node implementation to **bridge** node. 
+* Remove **dev** as a node type and make it a flag on every available node type.
+
+### Introduce bad encoding fraud proofs
+Bad encoding fraud proofs will be generated by **full** nodes inside of `ShareService`, upon reconstructing a block
+via the sampling process. 
+
+If fraud is detected, the **full** node will generate the proof and broadcast it to the `FraudSub` gossip network and
+will subsequently halt all operations. If no fraud is detected, the **full** node will continue operations without 
+propagating any messages to the network. Since **full** nodes reconstruct every block, they do not have to listen to 
+`FraudSub` as they perform the necessary encoding checks on every block.
+
+**Light** nodes, however, will listen to `FraudSub` for bad encoding fraud proofs. **Light** nodes will verify the 
+fraud proofs against the relevant header hash to ensure that the fraud proof is valid. 
+If the fraud proof is valid, the node should immediately halt all operations. If it is invalid, the node proceeds 
+operations as usual. 
+
+Eventually, we may implement a reputation tracking system for nodes who broadcast invalid fraud proofs to the network, 
+but that is for later iterations.
+
+### [Introduce an RPC structure and some basic APIs](https://github.com/celestiaorg/celestia-node/issues/169)
+Implement scaffolding for RPC on all node types, such that a user can access the following methods: 
+
+`HeaderAPI`
+
+* `Header(_height_)` -> ExtendedHeader{}
+* `Header(_hash_)` -> ExtendedHeader{}
+
+`NodeAPI`
+
+* `P2PInfo()` -> returns a blob of p2p info (can be broken into several subcommands, such as `net_info`)
+* `Config()` -> returns the node's config
+* `NodeType()` -> returns the node's type (e.g. **full** | **bridge** | **light** )
+* `RPCInfo()` -> RPC port, version, available APIs, etc.
+
+`UserAPI`
+
+* `AccountBalance(_acct_)` -> returns balance for given account
+* `SubmitTx(_txdata_)` -> submits a transaction to the network
+
+### Introduce `StateService`
+`StateService` is responsible for fetching state relevant to a user being able to submit a transaction, such as account
+balance, preparing the transaction, and propagating it via `TxSub`. **Bridge** nodes will be responsible for listening
+to `TxSub` and relaying the transactions into the Core mempool.
+
+Celestia-node's state interaction will be detailed further in a subsequent ADR.
+
+### [Data Availability Sampling during `HeaderSync`](https://github.com/celestiaorg/celestia-node/issues/181)
+
+Currently, both **light** and **full* nodes are unable to perform data availability sampling (DAS) while syncing.
+They only begin sampling once the node is synced up to head of chain. 
+
+`HeaderSync` and the `DASer` will be refactored such that the `DASer` will be able to perform sampling on past headers
+as the node is syncing. To do this, the syncing algorithms in both the `DASer` and `HeaderSync` should align so that
+headers received during sync will be propagated to the `DASer` for sampling via an internal pubsub.
+
+The `DASer` will maintain a checkpoint to the last sampled header so that it can continue sampling from the last 
+checkpoint on any new headers.
+
+
+<hr style="border:1px solid gray"> </hr>
+
+## Refactoring
+
+### `HeaderService` becomes main component around which most other services are focused
+Initially, we started with BlockService being the more “important” component during devnet architecture, but overlooked
+some problems with regards to sync (we initially made the decision that a celestia full node would have to be started 
+together at the same time as a core node -- which is the reason for embedding the core node).
+
+This led us to an issue where eventually we needed to connect to an already-running core node and sync from it. We were
+missing a component to do that, so we implemented `HeaderExchange` over the core client (wrapping another interface we 
+had previously created for BlockService called `BlockFetcher`), and we had to do this last minute b/c it wouldn’t work 
+otherwise, leading to a bunch of hacks and other issues (like having to hand the celestia full node a “trusted” hash of
+a header from the already-running chain so that it can sync up to that point and start listening for new headers.
+
+**Proposed new architecture**:
+
+### [`BlockService` is only responsible for reconstructing the block from Shares handed to it by the `ShareService`](https://github.com/celestiaorg/celestia-node/issues/251).
+Right now, the `BlockService` is in charge of fetching new blocks from the core node, erasure coding them, generating 
+DAH, generating `ExtendedHeader`, broadcasting `ExtendedHeader` to `HeaderSub` network, and storing the block data 
+(after some validation checks).
+
+Instead, we should rely on ShareService sampling to fetch us *enough* shares to reconstruct the block inside of 
+`BlockService`.
+
+### `ShareService` optimizations
+* Implement parallelization for retrieving shares by namespace. This
+[issue](https://github.com/celestiaorg/celestia-node/issues/184) is already being worked on.
+* NMT/Shares/Namespace storage optimizations (**TODO @WONDERTAN**)
+* Pruning/GC for shares.(**TODO @WONDERTAN**)
+
+### `HeaderSync` optimizations
+* Implement disconnect toleration 
+
+### Bonding period handling
+(**TODO @WONDERTAN**)
+
+<hr style="border:1px solid gray"> </hr>
+
+## Nice to have
+
+### [Move IPLD from celetia-node repo into its own repo](https://github.com/celestiaorg/celestia-node/issues/111)
+Since the IPLD package is pretty much entirely separate from the celestia-node implementation, it makes sense that it
+is removed from the celestia-node repository and maintained separately. The extraction of IPLD should also include a 
+review and refactoring as there are still some legacy components that are either no longer necessary and the 
+documentation also needs updating.