-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[optimization] Consider generating non-leaf nodes on the fly without storing #271
Comments
Another example for the approach drafted above is the go transparency log btw (not surprising as it is based on trillian): https://research.swtch.com/tlog#tiling_a_log |
A few thoughts:
I imagine validator nodes won't even have to serve historical transactions or shares, so they should do the latter. For storage nodes though, or infrastructure nodes that serve lots of requests, the former is probably better. |
Related to #2038 |
Shwap fixed it |
Here is the naive way:
Case 1) requesting all data for a height for instance:
just return all leaves. requester can recompute everything locally and just compare against root.
Case 2) requesting one leaf (e.g. per index):
Load block of that height (all leaves), recompute the inner nodes of corresponding Merkle tree and respond with leaf + corresponding inclusion proof.
case 3) request all leaves according of one namespace:
like in 2) but return a bunch of leaves and a with the corresponding (recomputed) proof.
A less naive approach would acknowledge the fact that always loading all leaves to respond might be undesirable as well. So what could be a middle ground? Only store some inner nodes, e.g. store the tree in packages of smaller subtrees with their roots (aka some inner nodes) and only recompute the missing inner nodes when necessary.
I think the latter is how roughly trillian and diem handle tree storage if I understand correctly. IIRC, diem treats every 4 level binary tree as one inner node and in trillian this is configurable. Both are approaches are probably not only there to decrease storage but also to optimize for IO (matters for large trees).
Tendermint also doesn't store inner nodes for tx data storage at all but still can return inner proof nodes for each tx (always simply recomputes all though). For the state tree though, I do think all inner nodes are stored in a nodedb there. IO is purely tried to be improved via a cache IIRC.
Our decision to store everything is only the fact that this is how the used libraries do things. Which might not necessarily be the best for our use-cases and access-patterns. We don't need to worry much about it right now. But we should keep in mind that this might either be an IO bottleneck in the future, or, cause node operators to complain about disk usage (as far as I remember we store inner nodes for both rows and columns this roughly corresponds to a 4x increase corresponding to the raw data, additionally there might be a lot of padding involved to make the squares a power of two ... so it's probably fair to say on average the increase is about 8-10x the actual block data/txs). I suspect, if we don't make this efficient long-term node operators will simply try to game it and not run the DA part / node at all.
Originally posted by @liamsi in #244 (comment)
The text was updated successfully, but these errors were encountered: