-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Conversation
d4e39a8
to
5904b62
Compare
Codecov Report
@@ Coverage Diff @@
## master #4250 +/- ##
========================================
- Coverage 78.7% 71.7% -7.1%
========================================
Files 192 193 +1
Lines 34262 37355 +3093
========================================
- Hits 26985 26799 -186
- Misses 7277 10556 +3279 |
79991ff
to
ade4a8d
Compare
ade4a8d
to
7bcc327
Compare
@@ -0,0 +1,42 @@ | |||
use std::sync::{Arc, Mutex}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess whos back!
7bcc327
to
c839a10
Compare
@garious @rob-solana can you guys think of any cleaner way to do this? |
I'm not able to look at this closely at the moment, but are you aware of this? https://doc.rust-lang.org/std/pin/index.html |
@garious Hmm. Haven't seen it before, but I don't see a straightforward way to apply it here. Presumably you could return a |
core/src/banking_stage.rs
Outdated
@@ -804,7 +804,9 @@ mod tests { | |||
#[test] | |||
fn test_banking_stage_entries_only() { | |||
solana_logger::setup(); | |||
let (genesis_block, mint_keypair) = create_genesis_block(10); | |||
let (mut genesis_block, mint_keypair) = create_genesis_block(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocator takes longer, so the test needed more time to avoid the first slot running out of ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to rearrange some stuff so that the tick start happens closer to the processing of transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pre-allocate some amount in recycler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I can try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I don't need it since I added a flag to only pin when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did that work out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag works so I didn't need to modify the test. I didn't try pre-allocating in the recycler but I didn't see an easy way to do that either.
have we root-caused our prior OOM from recycler yet? I didn't think we had... |
@rob-solana No, but we have OOM issues now anyway from the regular allocator so I can't make it worse :) |
I think you might be able to make it worse. Remember OOM in 20-30min? |
@rob-solana last time I ran it that was where we are at with a single-node setup today. It's just that we switch leaders so we don't notice it as much now. |
Also the blob recycler was I think the biggest problem. This change only needs to use the recycler for Packets and these signature offsets and output buffers. Just anything that goes to the GPU for sigverify. |
as I recall, we removed the blob recycler and the leaks continued unabated |
Ok, well we have to test it out then. I thought of a clever way to add it to minimize change and also allow for easy one-line removal. |
c839a10
to
db5ffc8
Compare
2f38251
to
84f276a
Compare
84f276a
to
6890dda
Compare
d14d147
to
69e8821
Compare
Bring back recyclers and pin offset buffers
* Add test for recycler and reduce the gc lock critical section * Add comments/tests to cuda_runtime
Otherwise buffered becomes millions of packets which causes memory issues for nodes and problems where forwarding all those packets can take 10s of seconds.
69e8821
to
69b0ae3
Compare
@@ -804,6 +804,9 @@ impl BankingStage { | |||
packet_indexes: Vec<usize>, | |||
) { | |||
if !packet_indexes.is_empty() { | |||
if unprocessed_packets.len() > 400 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgarg66 I found this has much better memory characteristics for fullnodes with capping the buffered packets size, even with the original code. Any concerns?
Revert "send-transactions: optimize retry pool (solana-labs#31)" This reverts commit 0abfe27.
Problem
cuda cannot overlap memory transfers and kernels if memory is not page-pinned.
Summary of Changes
use cudaHostRegister and cudaHostUnregister to page-pin memory.
Introduce
PinnedVec
wrapper class which has its memory pinned by the above cuda functions. It registers on creation, unregister on drop, also onpush
andresize
it checks the pointer and the capacity of the vec for any change and then does the unregister/register if changed.Add back the Recycler so that these
PinnedVec
's are re-used since we cannot call cudaHostRegister/cudHostUnregister in the middle of computation since it will cause a GPU sync.Fixes #4235