-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish TSMT #258
Comments
A sketch of a potential implementation for The idea is to iterate over each field element limb in Within a limb, the A new 256-bit sparse Merkle trie advice set would be added that extends the existing u64 advice set with extension nodes.
The
|
Thank you for the explanation! I think I understand the overall construction. A couple of questions:
Overall, so far, we have a few different use cases for SMTs: account vaults and account/nullifier databases. The underlying assumptions for these are slightly different:
I wonder if the same construction will work equally well in all cases - or whether we should try to specialize. |
Another potential approach could be to define a leaf node as: And if we added another node, it could be compresses like so: I'm probably missing a few details on how to make it work, but if the approach works, I think we'll never have to decode more than one node. I think the drawback would be that if two nodes happen to share a long common sub-path, the path length could be quite larger. |
One other thought regarding handling variability: maybe for things like account/nullifier databases we could have a layered approach:
With the above structure, compressed trees are unlikely to have more than 1K - 10K items in most cases. So, maybe we can use this number of items as the target under which our compacted SMT construction performs the best. The obvious drawback of the above approach is that even while the number of items in a tree is small, Merkle paths will be at least 32 nodes long. But that might not be a bad thing: it would mean that updating account/nullifier databases would always have a relatively consistent cost, regardless of which account is being updated. |
By the way, I noticed that there's an error in my second and third example: the depth in the extension node (orange box) should be 2, not 3.
The prover will nondeterministically provide this information, but it should be impossible to lie about. We can keep track of the total depth when working through a Merkle proof, which in this example would be 1+2+1=4, and verify it that it matches the tree's depth (which in this example is 4). The same can be done for the accumulated key and claimed key. If in this example the prover claimed that ROOT2 was a leaf node value, then the accumulated depth and key would not match the provided one. Note that we can't only track the remaining path, because then there would be no way to differentiate between a key of
Yep, I agree that there should be some quantitative numbers here. I'll work on that. For reference in the meantime, Ethereum's MPT for accounts has an average of ~7 nodes (regular or extension) for a given path in a tree storing ~175 million accounts. Of course those keys are composed of nibbles, not bits (i.e. key length of 64 vs 256), so this is just a lower bound.
Agreed, there needs to be more thinking about this. One point to mention is that there will likely be many more Merkle proofs proving vault/code/storage contents than getting or setting an account hash (at least in the transaction kernel -- I suppose there will be many account proofs processed in the block kernel).
Yes, I had considered something like this. In the Jellyfish paper they suggest using only this type of compaction, and doing away with extension nodes. It feels like this is more appropriate when the tree is storing a very large number of items, and that there are still likely to be great savings from shared key prefixes for trees storing fewer items. I'll have more insight into this soon from the extension node analysis. A related point to mention is that such an approach will require working with a |
The probability that, given an internal node exists in a SMT at the specified layer depth, only a single branch extends from it (as opposed to two branches) is plotted below for varying key counts: It appears that internal extension nodes would only be useful for compressing up to ~8 layers or so before a terminal extension node would likely be used. This suggests to me that the overhead of managing internal extension nodes is probably not worth it, and we should only compact the final remaining path, but I'd be interested to hear what others have to say. To summarize the previous comments, assuming we make all internal limb nodes be regular nodes, these are the ideas so far whose stack/hasher row tradeoffs need to be analyzed:
|
Thank you! I personally, would favor simplicity of the algorithm over the amount of hashing that needs to be done by the hasher. Here is how I'm thinking about: Assume that our target use case are data sets with between 1K and 1M items. On average, this means that Merkle path length should be somewhere between 10 and 20. Each hash takes 8 rows in the hasher table - so, verifying such paths translates to 80 - 160 rows. It would be good to have the number of cycles on the stack side to be roughly half this number (this is just a rule of thumb). So, if we can decode all relevant nodes in 40 - 80 cycles, it would be ideal - and I'm not sure this is achievable with more complicated decoding logic. I guess to summarize: in my mind, an approach where reading from the tree takes 200 rows in the hasher and 50 cycles in the stack is preferable to the approach which takes 100 rows in the hasher and 100 cycles in the stack. |
Assuming we go with the state model similar to the one described in #222, we need to be able to handle sparse Merkle trees (or variations thereof) in the VM. Specifically, we need this to represent account and nullifier databases.
These Merkle trees will act as maps which map ~32 byte keys to ~32 byte values (in our case, both keys and values would actually be 4 field elements). Supporting these trees requires the following:
A few considerations about each of these points.
Miden assembly procedures
We probably need the following procedures:
crypto::smt::get
- this procedure should push the value contained at the specified key to the stack. Stack transition for this procedure could look as follows:Where
ROOT
,KEY
, andVALUE
are represented by 4 field elements.crypto::smt::set
- this procedure should update the the value at the specified key. Stack transition for this procedure could look as follows:Similarly to the above,
ROOT
,NEW_ROOT
,KEY
, andNEW_VALUE
are represented by 4 field elements.We have
MPVERIFY
andMRUPDATE
operations in the VM which can be executed in a single cycle. I wonder if these can be used in these procedures.Advice set
To support the above procedures, we need to implement a new advice set. Our current advice set interface expects the key to be a single field element - so, we may have to refactor the interface. For context, advice set interface is used to provide Merkle paths for a given combination of root and node index.
Another important consideration is that we should expect these Merkle trees to contain large amounts of data (e.g., many gigabytes). So, it may be a good idea to use some in-process database to store the actual data.
The text was updated successfully, but these errors were encountered: