-
Notifications
You must be signed in to change notification settings - Fork 4.7k
updates rewards at epoch boundary using cached accounts #24200
updates rewards at epoch boundary using cached accounts #24200
Conversation
@jstarry still doing more testings but you may want to take a look in case of comments. |
b5d9c53
to
d1ff7ac
Compare
Codecov Report
@@ Coverage Diff @@
## master #24200 +/- ##
===========================================
+ Coverage 70.0% 82.0% +11.9%
===========================================
Files 36 594 +558
Lines 2255 164444 +162189
Branches 322 0 -322
===========================================
+ Hits 1580 134891 +133311
- Misses 560 29553 +28993
+ Partials 115 0 -115 |
d1ff7ac
to
fa6e579
Compare
what did you learn? |
fa6e579
to
00a1173
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, the serialization compatibility in particular looks really good. I'm hoping that updating StakeAccount
a bit will help remove a lot of the fallible methods when converting from StakeAccount
to Delegation
.
@behzadnouri since this PR introduces a new feature gate, please create a new feature tracker issue and link it to this PR, thanks! Details here: https://github.com/solana-labs/solana/blob/master/CONTRIBUTING.md#the-pr--issue-labels |
done. |
@jeffwashington As mentioned in the description, the change shows ~30% improvement in Note that this change will save both on loading accounts (by bypassing accounts-db altogether) and deserializing state from accounts (because the state is already deserialized in these cached accounts). Also does not force evicting other accounts from readonly cache. |
4cafe3c
to
d83c161
Compare
@@ -196,6 +196,12 @@ impl From<AccountSharedData> for VoteAccount { | |||
} | |||
} | |||
|
|||
impl From<VoteAccount> for AccountSharedData { | |||
fn from(account: VoteAccount) -> Self { | |||
Self::from(account.0.account.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wen stable unwrap_or_clone
🥺
runtime/src/stake_account.rs
Outdated
}; | ||
let stake_state = StakeState::Stake(Meta::example(), Stake::example()); | ||
let mut account = Account::example(); | ||
account.data.resize(256, 0u8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the extra 56 bytes make a difference in how this thing works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it does not. reduced the size to 196 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_of<StakeState>()
is 200, fwiw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe because of struct paddings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's some padding, I added a constant to prevent this issue in the future: #24416
// vote_accounts_cache_miss_count is shown to be always zero. | ||
let account = self.get_account_with_fixed_root(vote_pubkey)?; | ||
if account.owner() == &solana_vote_program | ||
&& VoteState::deserialize(account.data()).is_ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we reuse this work to construct the VoteAccount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code already has comments here that this is only for sanity check and this path should never return a valid vote-account. My intention is to remove this piece all together once it is confirmed vote_accounts_cache_miss_count
is always zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzadnouri I missed this earlier but I think that the code here will produce false positives for the vote_accounts_cache_miss_count
counter. We only add vote accounts to the cache if they are initialized so it's possible that an uninitialized vote account has a stake delegation and deserializes correctly. If that happens, this counter would indicate that we had a cache miss but the cache was actually correct. This all stems from the fact that the vote state doesn't have an intuitive "uninitialized" state. Best we can do is check if the authorized voter is non-zero:
solana/programs/vote/src/vote_state/vote_state_versions.rs
Lines 54 to 62 in 7cd7173
pub fn is_uninitialized(&self) -> bool { | |
match self { | |
VoteStateVersions::V0_23_5(vote_state) => { | |
vote_state.authorized_voter == Pubkey::default() | |
} | |
VoteStateVersions::Current(vote_state) => vote_state.authorized_voters.is_empty(), | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then shouldn't this code also check if the vote-account is initialized?
https://github.com/solana-labs/solana/blob/30d2b112e/runtime/src/bank.rs#L2848-L2888
Seems to me either we should skip uninitialized vote accounts there as well, or store them in the cache as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I brought this up because I think this code needs to check for initialization but it isn't doing that right now.
d83c161
to
abb347f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! just a nit
Err(stake_account::Error::InvalidDelegation(_)) => { | ||
// This should not happen because StakeAccount<()> | ||
// does not check if stake state is a Delegation. | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log if it happens anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
runtime/src/bank.rs
Outdated
let (account, _slot) = bank_rc.accounts.load_with_fixed_root(&ancestors, pubkey)?; | ||
Some(account) | ||
}) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic if the stakes cache and accountsdb are inconsistent right? Seems like we should should rebuild the cache in this case, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it is a major logic bug if these are inconsistent, and can indicate bugs in the cache or accounts-db or a corrupted snapshot.
I think having this panic here will actually help to detect such inconsistencies earlier.
Should this actually happen, I think the best would be to hold on going down this path until we can debug this inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this panic here will actually help to detect such inconsistencies earlier.
This is true, can you change the unwrap
to an expect
with a message describing that it "can indicate bugs in the cache or accounts-db or a corrupted snapshot"
Should this actually happen, I think the best would be to hold on going down this path until we can debug this inconsistency.
You mean try this on our nodes first to see if it's happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change the unwrap to an expect with a message describing that it "can indicate bugs in the cache or accounts-db or a corrupted snapshot"
done.
Should this actually happen, I think the best would be to hold on going down this path until we can debug this inconsistency.
You mean try this on our nodes first to see if it's happening?
For all nodes which submit metrics we can monitor panics through the panic hook:
https://github.com/solana-labs/solana/blob/6c5a3ca4a/metrics/src/metrics.rs#L442-L454
So any node which runs the master branch, ours or not ours, will do. And then testnet nodes once this is backported to testnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, sounds good
abb347f
to
7757c08
Compare
7757c08
to
d10c2f9
Compare
d10c2f9
to
20ab672
Compare
20ab672
to
d950b39
Compare
d950b39
to
56b9a41
Compare
The added type does sanity checks on the Account and stores deserialized StakeState. Following commits will use this type instead of Delegation in Stakes.
The commit makes values in stake_delegations map in Stakes struct generic. Stakes<Delegation> is equivalent to the old code and is used for backward compatibility in BankFieldsTo{Serialize,Deserialize}. But banks cache Stakes<StakeAccount> which includes the entire stake account and StakeState deserialized from account. Doing so, will remove the need to load stake account from accounts-db when working with stake-delegations.
For backward compatibility, we can only serialize and deserialize Stakes<Delegation>. However Bank caches Stakes<StakeAccount>. This type mismatch incurs a conversion cost at epoch boundary when updating EpochStakes. This commit adds StakesEnum which allows EpochStakes to include either a Stakes<StakeAccount> or Stakes<Delegation> and so bypass the conversion cost between the two at the epoch boundary.
Loading vote and stake accounts from accounts-db takes a significant portion of time updating rewards at epoch boundary. This commit bypasses accounts-db and instead uses vote and stake accounts cached in bank stakes: https://github.com/solana-labs/solana/blob/d2702201c/runtime/src/stakes.rs#L148-L152 These cached accounts are synchronized with accounts-db after each transaction, and so there should not be any change in the resulting computation: https://github.com/solana-labs/solana/blob/d2702201c/runtime/src/bank.rs#L4526 Nevertheless, to avoid any chances of introducing a consensus issue, the switch to cached account is feature gated.
StakeAccount<Delegation> can only wrap a stake-state which is a Delegation; whereas StakeAccount<()> wraps any account with stake state. As a result, StakeAccount::<Delegation>::delegation() will return Delegation instead of Option<Delegation>.
Tracking mismatches between cached vote/stake accounts and accounts-db in preparation of activating feature for updating rewards at epoch-boundary using cached vote/stake accounts.
56b9a41
to
634b353
Compare
Problem
Updating rewards at epoch boundary is slow; and Loading vote and stake accounts from accounts-db takes a significant portion of that computation time.
Summary of Changes
https://github.com/solana-labs/solana/blob/d2702201c/runtime/src/stakes.rs#L148-L152
https://github.com/solana-labs/solana/blob/d2702201c/runtime/src/bank.rs#L4526
Nevertheless, to avoid any chances of introducing a consensus issue, the switch to cached account is feature gated.
stake_delegations
map inStakes
struct are now generic.Stakes<Delegation>
is equivalent to the old code and is used for backward compatibility inBankFieldsTo{Serialize,Deserialize}
.Stakes<StakeAccount>
which includes the entire stake account andStakeState
deserialized from the account. Doing so, will remove the need to load the stake account from accounts-db when working with stake-delegations.update_rewards_with_thread_pool_us
metric.Feature Gate Issue: #24278