Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return back stale out-of-pool scheduler by timeout #1690

Merged
merged 5 commits into from
Jun 13, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions runtime/src/installed_scheduler_pool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,13 @@ impl SchedulerStatus {
};
result_with_timings
}

fn active_scheduler(&self) -> &InstalledSchedulerBox {
let SchedulerStatus::Active(active_scheduler) = self else {
panic!("not active: {self:?}");
};
active_scheduler
}
}

/// Very thin wrapper around Arc<Bank>
Expand Down Expand Up @@ -519,15 +526,15 @@ impl BankWithSchedulerInner {
);
Err(SchedulerAborted)
}
SchedulerStatus::Stale(_pool, _result_with_timings) => {
SchedulerStatus::Stale(pool, _result_with_timings) => {
let pool = pool.clone();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to clone the pool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this pool is borrowed and we need to re-lock scheduler with the write access:

error[E0505]: cannot move out of `scheduler` because it is borrowed
   --> runtime/src/installed_scheduler_pool.rs:532:22
    |
517 |         let scheduler = self.scheduler.read().unwrap();
    |             --------- binding `scheduler` declared here
518 |         match &*scheduler {
    |                 --------- borrow of `scheduler` occurs here
...
532 |                 drop(scheduler);
    |                      ^^^^^^^^^ move out of `scheduler` occurs here
...
552 |                 pool.register_timeout_listener(self.do_create_timeout_listener());
    |                 ---- borrow later used here

For more information about this error, try `rustc --explain E0505`.
error: could not compile `solana-runtime` (lib) due to 1 previous error

drop(scheduler);

let context = SchedulingContext::new(self.bank.clone());
let mut scheduler = self.scheduler.write().unwrap();
trace!("with_active_scheduler: {:?}", scheduler);
scheduler.transition_from_stale_to_active(|scheduler_pool, result_with_timings| {
scheduler_pool.register_timeout_listener(self.do_create_timeout_listener());
let scheduler = scheduler_pool.take_resumed_scheduler(context, result_with_timings);
scheduler.transition_from_stale_to_active(|pool, result_with_timings| {
let scheduler = pool.take_resumed_scheduler(context, result_with_timings);
info!(
"with_active_scheduler: bank (slot: {}) got active, taking scheduler (id: {})",
self.bank.slot(),
Expand All @@ -537,15 +544,20 @@ impl BankWithSchedulerInner {
});
drop(scheduler);

self.with_active_scheduler(f)
let scheduler = self.scheduler.read().unwrap();
// Re-register a new timeout listener only after acquiring the read lock;
// Otherwise, the listener would again put scheduler into Stale before the read
// lock, causing panic below.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops forgot to mention this is extremely rare race condition...

pool.register_timeout_listener(self.do_create_timeout_listener());
f(scheduler.active_scheduler())
}
SchedulerStatus::Unavailable => unreachable!("no installed scheduler"),
}
}

fn do_create_timeout_listener(self: &Arc<Self>) -> TimeoutListener {
let weak_bank = Arc::downgrade(self);
TimeoutListener::new(move |scheduler_pool| {
TimeoutListener::new(move |pool| {
let Some(bank) = weak_bank.upgrade() else {
return;
};
Expand All @@ -568,7 +580,7 @@ impl BankWithSchedulerInner {
bank.bank.slot(),
id,
);
(scheduler_pool, result_with_timings)
(pool, result_with_timings)
});
trace!("timeout_listener: {:?}", scheduler);
})
Expand Down