Make unified schedler abort on tx execution errors #1211

ryoqun · 2024-05-07T07:41:21Z

Problem

As a quick background, the unified scheduler currently doesn't terminate at all. In other words, it leaks shamelessly. :p It means its memory usage is unbounded and indeed causes the process to be reaped by the out-of-memory killer after a while (like a week), even while it's pooled for reuse.

Let alone the memory issue, it needs proper shutdown mechanism for other reasons as well. Specifically, there are following several termination conditions which it must handle respectively:

Unified scheduler encounters on transaction errors while running. And it needs to abort immediately:
- Make unified schedler abort on tx execution errors #1211
One of the handler threads could panic for extreme situations due to newly discovered LoA-like bugs. In that case, unified scheduler should propagate panic! promptly to terminate the whole process. This is an edge case variant of 1. (note: I'm against resuming normal operation after panic):
- Notify panics in handlers to the scheduler thread promptly #1574
A unified scheduler can be idling in the scheduler pool for very long time. And it needs to be disposed of:
- Clean idle pooled schedulers periodically #1575
Because UsageQueueLoader doesn't never evict unused entries. it can become too big with many entries. Its design relies on some external mechanism to mitigate the unbounded growth. Thus, the unified scheduler needs to drop the scheduler itself depending on the size of UsageQueueLoader:
- Trash overgrown schedulers with many usage queues #1672
There could be many active (= taken-out-of-the-pool) unified schedulers under very forky network conditions for extended duration of lack of rooting. In that case, many native os threads will be created just to sit idling collectively because each unified scheduler instance separately creates and manages its own set of threads individually (for perf reasons). So, idling scheduler should be returned back to the pool. Then, eventually those pooled-back schedulers will be retired according to the (3) after the forky situation is resolved:
- Return back stale out-of-pool scheduler by timeout #1690

As for the (1), there's currently also no way for the unified scheduler to propagate errors back to the callers (the replay stage) until the bank freezing. So, the dead-block marking by the replay stage could be delayed by maliciously-crafted blocks even if the unified scheduler immediately aborts internally.

Summary of Changes

This pr specifically addresses (1).

To that end, this pr makes the new task code-path return Results to abruptly propagate previously-scheduled transaction error when unrelated new tasks are about to be submitted to the unified scheduler, in order to notify the replay stage earlier than reaching block boundaries.

After that, this pr introduces crossbeam-channel disconnection based cross-thread coordination for graceful termination of the unified scheduler instance itself. In this way, there's almost no runtime overhead other than bunch of additional ifs. Also, this means there's no new potential bottleneck and synchronization. On the other hand, this incurs code complexities.

Lastly, a pool-owned (i.e. singleton) auxiliary background thread is introduced to actually drop the aborted scheduler.

Other termination conditions will be addressed in upcoming prs.

context: extracted from #1122

codecov-commenter · 2024-05-07T09:28:06Z

Codecov Report

Attention: Patch coverage is 91.11675% with 70 lines in your changes are missing coverage. Please review.

Project coverage is 81.6%. Comparing base (af6930d) to head (649bce3).
Report is 111 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #1211    +/-   ##
========================================
  Coverage    81.6%    81.6%            
========================================
  Files         867      868     +1     
  Lines      368900   369593   +693     
========================================
+ Hits       301052   301656   +604     
- Misses      67848    67937    +89

apfitzge

a few nits. I need to take another pass, specifically to review the trashed scheduler concept.

runtime/src/installed_scheduler_pool.rs

unified-scheduler-pool/src/lib.rs

ryoqun · 2024-05-15T14:32:33Z

thanks for review! I've addressed all comments so far. re-requesting code-review.

apfitzge · 2024-05-21T01:14:39Z

I'm not sure I understand the concept of trashed schedulers.

When we return schedulers to the pool, if all exection threads are not joined it is considered "trashed". We will clean it up later.
But afaict we always call wait_for_termination before we return to pool, so i'm uncertain how a scheduler can ever become trashed.

Can you give an example of when a scheduler would become trashed? Think I'm still missing something here.

ryoqun · 2024-05-21T06:39:40Z

I'm not sure I understand the concept of trashed schedulers.

When we return schedulers to the pool, if all exection threads are not joined it is considered "trashed". We will clean it up later. But afaict we always call wait_for_termination before we return to pool, so i'm uncertain how a scheduler can ever become trashed.

Can you give an example of when a scheduler would become trashed? Think I'm still missing something here.

Thanks for the question. seems more explanation is needed... I won't intentionally gave such an example to verify others can decipher my code by themselves. So, I instead added comments rather extensively as much as possible: 8212761

Could you give another try to grasp the trashed schedulers?

ryoqun · 2024-05-26T14:14:51Z

all of test sleeps has gone: 649bce3

ryoqun · 2024-05-26T14:18:22Z