Redesign receipt/tx trie generation on block imports #565

gsalgado · 2018-04-17T15:53:08Z

We used to generate the tries for all block's transactions/receipts
after applying every transaction, but that is way too expensive. Now the
trie data is only generated once, after we've applied all transactions

pipermerriam · 2018-04-17T16:23:26Z

evm/vm/base.py

@@ -132,6 +133,7 @@ def execute_bytecode(self,
    # Mining
    #
    def import_block(self, block):
+        self.receipts = []


This stood out to me. What if instead of tracking this state we instead returned the receipt from apply_transaction and accumulated the receipts in the loop here

It looks like this change goes well with the changes within the VMState class. It looks like we could remove the need for that class to be passed the receipts since I think this changes things so that the only thing it needs to know is the gas_used value from the latest receipt.

Yeah, but should we then pass gas_used as an argument to __init__() and keep updating it in add_transaction()? Or did you have something else in mind?

Honestly not sure. Nothing specific in mind for exact implementation but starting with gas_used being passed into __init__ seems like a good start. It'd be nice if it didn't need to be mutated but i'm more ok with mutating that value than the receipt storage on the VM instance.

So, we need gas_used in BaseState so I'm passing it via __init__() and updating it on BaseState.add_transaction(), but in BaseVM we also need that value in the .state @Property, which is currently looking that up from BaseVM.receipts but we could instead have a BaseVM.gas_used and update that on BaseVM.apply_transaction(). Would you prefer that?

That sounds viable. We can continue to iterate on this to I'd recommend just picking something that seems good enough.

We actually need to keep the receipts somewhere in order to generate the receipt trie

pipermerriam · 2018-04-17T16:29:43Z

evm/vm/base.py

@@ -266,6 +275,8 @@ def validate_block(self, block):
                    )
                )

+        # FIXME: make_trie_root_and_nodes() is rather expensive, and we already run that once in
+        # import_block(), so should refactor some of this code to avoid re-generating it here.


This seems like we could fix this with some caching on the make_trie_root_and_nodes function (or maybe a wrapper around that function which implements a cache).

Yeah, I guess #573 might still be useful!

pipermerriam · 2018-04-17T17:52:44Z

evm/vm/base.py

@@ -459,7 +459,7 @@ def get_state_class(cls):
        return cls._state_class

    @classmethod
-    def get_state(cls, chaindb, block, prev_hashes, receipts=None):
+    def get_state(cls, chaindb, block, prev_hashes, gas_used=0):


It seems like gas_used shouldn't have a default value.

hwwhww · 2018-04-17T18:14:58Z

evm/vm/base.py

@@ -77,14 +80,12 @@ def apply_transaction(self, transaction):
        """
        Apply the transaction to the vm in the current block.
        """
-        computation, block, trie_data_dict = self.state.apply_transaction(
+        computation, block, receipt = self.state.apply_transaction(


Reminder: the reason why apply_transaction returned trie_data_dict is for the stateless client to generate witness, which was discussed here: #247 (comment)

But since the sharding roadmap was (largely) changed, now the stateless client implementation won't happen in couple months, so I don't have a strong opinion about the change. Hope that we will find a solution to solve in both stateless/non-stateless clients when we reimplement stateless client. :)

We could have the VM incrementally generate the trie data, I suppose? The problem was that it was re-generating it from scratch for every transaction applied

Except it wouldn't make the VM class stateless, so I guess that wouldn't really buy us anything

@hwwhww we'll continue to move towards stateless clients but for now going to prioritze full sync and then we'll figure out how to make it work with stateless clients also.

hwwhww · 2018-04-18T12:39:34Z

The second thoughts of this PR:

[Previously]

The transaction and receipt roots are calculated in State.add_transaction().
Each time VM.apply_transaction() is called, the vm.block.header is updated with the latest transaction_root and receipt_root.

[Now]

State.add_transaction() returns receipt.
The transaction and receipt roots are calculated in VM.import_block(self, block).
Each time VM.apply_transaction() is called, the vm.block.header isn't updated with the latest transaction_root and receipt_root.

I took a look at vm.mine_block() which hasn't been changed in this PR:

https://github.com/gsalgado/py-evm/blob/bceb324d7f33b4d41aa1cead97229414021906b7/evm/vm/base.py#L164-L176

I think now the vm.mine_block and chain.mine_block() don't work as what it used to be: return a mined block, because the current block (vm.block) hasn't been updated with correct transaction_root and receipt_root.

The fastest way to solve it may be calculating the roots in vm.mine_block() instead of in vm.import_block(), and we still lost the property of "having vm.block always has the latest roots info".

p.s. I think that's the reason why tests/core/vm/test_vm.py::test_get_cumulative_gas_used went wrong (#564).

gsalgado · 2018-04-18T13:16:09Z

#573 has an optimization to make_trie_root_and_nodes() that we could do instead of what's suggested here. I'd need to profile it more thoroughly to make sure it's good enough but I'd be happy to go with that if you guys think that's preferable

hwwhww · 2018-04-18T15:03:31Z

Oooooh, correct my own words: for the witness, we don't need trie_data_dict, only need the state trie branches, the main difference between is do we expect state.apply_transaction() returns the block with latest header or not.

If #573 doesn't work well, maybe we could redesign the APIs to be "for importing a block (only generating the root once per block)" and "for creating a block (generating the root per transaction?)". Maybe divide state.apply_transaction into multiple functions.

hwwhww · 2018-04-18T18:51:08Z

evm/vm/base.py

@@ -156,6 +159,13 @@ def mine_block(self, *args, **kwargs):
        Mine the current block. Proxies to self.pack_block method.
        """
        block = self.block
+        tx_root_hash, tx_kv_nodes = make_trie_root_and_nodes(block.transactions)
+        receipt_root_hash, receipt_kv_nodes = make_trie_root_and_nodes(self.receipts)
+        trie_data = merge(tx_kv_nodes, receipt_kv_nodes)


Probably unimportant: for performance optimization, this line could be removed, and just add them separately:

self.chaindb.persist_trie_data_dict(tx_kv_nodes) self.chaindb.persist_trie_data_dict(receipt_kv_nodes)

We used to generate the tries for all block's transactions/receipts after applying every transaction, but that is way too expensive. Now the trie data is only generated once, after we've applied all transactions

It was no longer used anywhere and it doesn't work because our VM is no longer stateless

gsalgado force-pushed the block-import-trie-generation-redesign branch from a43ff3e to 186d898 Compare April 17, 2018 16:04

pipermerriam reviewed Apr 17, 2018

View reviewed changes

hwwhww reviewed Apr 17, 2018

View reviewed changes

gsalgado force-pushed the block-import-trie-generation-redesign branch 3 times, most recently from 6d1a925 to 8da75f9 Compare April 18, 2018 08:59

gsalgado mentioned this pull request Apr 18, 2018

Add an LRU cache to make_trie_root_and_nodes() #573

Merged

gsalgado changed the title ~~wip: redesign receipt/tx trie generation on block imports~~ Redesign receipt/tx trie generation on block imports Apr 18, 2018

gsalgado force-pushed the block-import-trie-generation-redesign branch from bceb324 to 283c768 Compare April 18, 2018 17:00

pipermerriam approved these changes Apr 18, 2018

View reviewed changes

hwwhww reviewed Apr 18, 2018

View reviewed changes

gsalgado added 3 commits April 18, 2018 20:08

Redesign receipt/tx trie generation on block imports

eb9b34e

We used to generate the tries for all block's transactions/receipts after applying every transaction, but that is way too expensive. Now the trie data is only generated once, after we've applied all transactions

Get rid of BaseState.receipts

21fbcef

Drop Chain.apply_transaction()

acfd862

It was no longer used anywhere and it doesn't work because our VM is no longer stateless

gsalgado force-pushed the block-import-trie-generation-redesign branch from 283c768 to acfd862 Compare April 18, 2018 19:08

gsalgado mentioned this pull request Apr 18, 2018

Performance Optimizations #542

Closed

hwwhww approved these changes Apr 18, 2018

View reviewed changes

gsalgado merged commit d836699 into ethereum:master Apr 19, 2018

gsalgado mentioned this pull request Apr 19, 2018

Look into ways to reduce our DB size #523

Closed

pipermerriam mentioned this pull request Apr 19, 2018

Ability to incrementally build a block #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign receipt/tx trie generation on block imports #565

Redesign receipt/tx trie generation on block imports #565

gsalgado commented Apr 17, 2018

pipermerriam Apr 17, 2018

gsalgado Apr 17, 2018

pipermerriam Apr 17, 2018

gsalgado Apr 18, 2018

pipermerriam Apr 18, 2018

gsalgado Apr 18, 2018

pipermerriam Apr 17, 2018

gsalgado Apr 18, 2018

pipermerriam Apr 17, 2018

hwwhww Apr 17, 2018 •

edited

Loading

gsalgado Apr 18, 2018

gsalgado Apr 18, 2018

pipermerriam Apr 18, 2018

hwwhww commented Apr 18, 2018 •

edited

Loading

gsalgado commented Apr 18, 2018

hwwhww commented Apr 18, 2018 •

edited

Loading

hwwhww Apr 18, 2018

Redesign receipt/tx trie generation on block imports #565

Redesign receipt/tx trie generation on block imports #565

Conversation

gsalgado commented Apr 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwwhww Apr 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwwhww commented Apr 18, 2018 • edited Loading

gsalgado commented Apr 18, 2018

hwwhww commented Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

hwwhww Apr 17, 2018 •

edited

Loading

hwwhww commented Apr 18, 2018 •

edited

Loading

hwwhww commented Apr 18, 2018 •

edited

Loading