StorageCache lock is slowing down RPC requests #9312

crystalin · 2021-07-08T22:39:16Z

Scenario

In frontier, we have the eth_getLogs with allows to query big range of blocks (we allow 2000 per request) for their ethereum logs.
In the case of Ethereum, it uses the client storage to retrieve the block 1 by 1 (see https://github.com/paritytech/frontier/blob/master/client/rpc/src/overrides/schema_v1_override.rs#L59)
When multiple request to eth_getLogs comes together, it uses up to 4 threads (or up to --rpc-http-threads).

The observation

(on a 32 cores computer)

No matter how many cpu and rpc-http-threads, we are always limited to a certain CPU load (2.5 in our case), so with 4 threads, it would use 4 CPUs @ 60% utilization each, but with 10 it would use 10 CPU @ 25% each.
Using parity-db boosted a bit the performance but there is still a bottleneck somewhere
Changing the --state-cache-size value has no impact

Those logs shows the behavior when requesting those blocks

The issue

When accessing the state_at for a given block, substrate relies on a StorageCache (https://github.com/paritytech/substrate/blob/master/client/db/src/lib.rs#L2311) which works using a RW lock (https://github.com/paritytech/substrate/blob/master/client/db/src/storage_cache.rs#L540).
This lock creates a bottleneck for all the threads trying to access the storage.

This is confirmed by the perf analysis of the request:

The solution

The possibility to remove the StorageCache for the node (in the case of RPC nodes it makes more sense), or to only have the StorageCache used for the last 10 blocks (the one mostly used) would work.
Disabling the StorageCache (by commenting the fn storage(... function logic to only do Ok(self.state.storage(key)?)) showed great improvement in performances, using up to 90% of the cpus for each of the 4 rpc threads used (there are still some other bottleneck preventing to scale to 100% but I'll investigate later)

If you have another good solution in mind, I'd be happy to discuss it

The text was updated successfully, but these errors were encountered:

crystalin · 2021-07-08T22:39:38Z

(cc @bkchr and @arkpar )

arkpar · 2021-07-09T13:33:15Z

@crystalin Could you check if this commit improves things?
It's a stop gap until we come up with a better solution,

crystalin · 2021-07-10T02:49:56Z

@arkpar , I did test it (cherry-pick over polkadot-v0.9.6) and it worked great too (similar results with when I disabled the cache logic). Splitting the read/write is a good strategy in that case

bkchr added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Jul 8, 2021

This was referenced Jul 10, 2021

Lock-free state cache #9320

Closed

Reduce shared state cache contention #9321

Merged

ghost closed this as completed in #9321 Jul 14, 2021

tgmichel mentioned this issue Jul 15, 2021

Schema cache for log filtering polkadot-evm/frontier#425

Merged

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StorageCache lock is slowing down RPC requests #9312

StorageCache lock is slowing down RPC requests #9312

crystalin commented Jul 8, 2021

crystalin commented Jul 8, 2021

arkpar commented Jul 9, 2021

crystalin commented Jul 10, 2021

StorageCache lock is slowing down RPC requests #9312

StorageCache lock is slowing down RPC requests #9312

Comments

crystalin commented Jul 8, 2021

Scenario

The observation

The issue

The solution

crystalin commented Jul 8, 2021

arkpar commented Jul 9, 2021

crystalin commented Jul 10, 2021