-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Account path add run parent with old path cleanup #29942
Account path add run parent with old path cleanup #29942
Conversation
4c9ffb5
to
f8f58f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future please also mark the PR as Ready For Review. Sometimes it can be ambiguous when you've been requested to review but the PR itself is still a draft.
@behzadnouri Here is the new version which cleans up the appendvecs in the old paths. f8f58f2 is the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work tracking down the issue, glad to see it was a relatively simple resolution.
runtime/src/snapshot_utils.rs
Outdated
// If the "run/" or "snapshot" sub directories do not exist, the directory may be from | ||
// an older version for which the appendvec files are at this directory. Clean up | ||
// them first. | ||
// This will be done only once when transitioning from an old image without run directory | ||
// to this new version using run and snapshot directories. | ||
// The run/ content cleanup will be done at a later point. The snapshot/ content persists | ||
// accross the process boot, and will be purged by the account_background_service. | ||
move_and_async_delete_path(account_dir.as_ref()); | ||
fs::remove_dir_all(account_dir.as_ref())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs::remove_dir_all(account_dir.as_ref())?; | |
fs::remove_dir_all(account_dir)?; |
do we need the .as_ref()
for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to &account_dir.
.collect(); | ||
}).map( | ||
|account_path| { | ||
// For all account_paths, set up the run/ and snapshot/ sub directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to update this comment to make it clear this will clean the accounts dir if they don't already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment "// If the sub directories do not exist, the account_path will be cleaned because older version put account files there".
I see this code is also modifying ledger-tool; |
Yes, verified with |
what does this verify?! this does not create any ledger of comparable size to mainnet to verify or notice regressions. |
The ledger tool verify command reads the snapshot generated by the validator. I checked and see the accounts files are in run/ sub directory. What other tests should I run? |
Just like running a node mainnet. Obtain a ledger snapshot from mainnet, and run leger-tool on that |
I ran "./cargo run --bin solana-ledger-tool -- --ledger ~/ledger verify" which gets a snapshot from the validator running on mainnet. Is this good? |
@@ -74,7 +75,8 @@ struct SnapshotTestConfig { | |||
incremental_snapshot_archives_dir: TempDir, | |||
full_snapshot_archives_dir: TempDir, | |||
bank_snapshots_dir: TempDir, | |||
accounts_dir: TempDir, | |||
accounts_dir: PathBuf, | |||
_accounts_tmp_dir: TempDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here xref: #31319 (comment)
Problem
PR #29794 was reverted because of OOM. This is the resubmission with the fix.
The new image uses the run/ sub directory for the appendvec files. When transitioning from an old image to this new image, the appendvec files in the old account_path should be cleaned up to avoid taking the space. In the ramfs case, it was taking the memory, causing OOM.
All commits have passed the review in PR 29794 except the last one f8f58f2
Summary of Changes
The fix is to clean up the files in the old path.
Fixes #