Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

StorageCache lock is slowing down RPC requests #9312

Closed
crystalin opened this issue Jul 8, 2021 · 3 comments · Fixed by #9321
Closed

StorageCache lock is slowing down RPC requests #9312

crystalin opened this issue Jul 8, 2021 · 3 comments · Fixed by #9321
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@crystalin
Copy link
Contributor

Scenario

In frontier, we have the eth_getLogs with allows to query big range of blocks (we allow 2000 per request) for their ethereum logs.
In the case of Ethereum, it uses the client storage to retrieve the block 1 by 1 (see https://github.com/paritytech/frontier/blob/master/client/rpc/src/overrides/schema_v1_override.rs#L59)
When multiple request to eth_getLogs comes together, it uses up to 4 threads (or up to --rpc-http-threads).

The observation

(on a 32 cores computer)

  • No matter how many cpu and rpc-http-threads, we are always limited to a certain CPU load (2.5 in our case), so with 4 threads, it would use 4 CPUs @ 60% utilization each, but with 10 it would use 10 CPU @ 25% each.
  • Using parity-db boosted a bit the performance but there is still a bottleneck somewhere
  • Changing the --state-cache-size value has no impact

Those logs shows the behavior when requesting those blocks
cache-logs

The issue

When accessing the state_at for a given block, substrate relies on a StorageCache (https://github.com/paritytech/substrate/blob/master/client/db/src/lib.rs#L2311) which works using a RW lock (https://github.com/paritytech/substrate/blob/master/client/db/src/storage_cache.rs#L540).
This lock creates a bottleneck for all the threads trying to access the storage.

This is confirmed by the perf analysis of the request:
perf-lock

The solution

  • The possibility to remove the StorageCache for the node (in the case of RPC nodes it makes more sense), or to only have the StorageCache used for the last 10 blocks (the one mostly used) would work.
    Disabling the StorageCache (by commenting the fn storage(... function logic to only do Ok(self.state.storage(key)?)) showed great improvement in performances, using up to 90% of the cpus for each of the 4 rpc threads used (there are still some other bottleneck preventing to scale to 100% but I'll investigate later)

If you have another good solution in mind, I'd be happy to discuss it

@crystalin
Copy link
Contributor Author

(cc @bkchr and @arkpar )

@bkchr bkchr added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Jul 8, 2021
@arkpar
Copy link
Member

arkpar commented Jul 9, 2021

@crystalin Could you check if this commit improves things?
It's a stop gap until we come up with a better solution,

@crystalin
Copy link
Contributor Author

@arkpar , I did test it (cherry-pick over polkadot-v0.9.6) and it worked great too (similar results with when I disabled the cache logic). Splitting the read/write is a good strategy in that case

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants