Page-pin packet memory for cuda #4250

sakridge · 2019-05-10T20:05:41Z

Problem

cuda cannot overlap memory transfers and kernels if memory is not page-pinned.

Summary of Changes

use cudaHostRegister and cudaHostUnregister to page-pin memory.

Introduce PinnedVec wrapper class which has its memory pinned by the above cuda functions. It registers on creation, unregister on drop, also on push and resize it checks the pointer and the capacity of the vec for any change and then does the unregister/register if changed.

Add back the Recycler so that these PinnedVec's are re-used since we cannot call cudaHostRegister/cudHostUnregister in the middle of computation since it will cause a GPU sync.

Fixes #4235

codecov · 2019-05-10T21:15:11Z

Codecov Report

Merging #4250 into master will decrease coverage by 7%.
The diff coverage is 62.9%.

@@           Coverage Diff            @@
##           master   #4250     +/-   ##
========================================
- Coverage    78.7%   71.7%   -7.1%     
========================================
  Files         192     193      +1     
  Lines       34262   37355   +3093     
========================================
- Hits        26985   26799    -186     
- Misses       7277   10556   +3279

aeyakovenko · 2019-05-14T07:02:14Z

core/src/recycler.rs

@@ -0,0 +1,42 @@
+use std::sync::{Arc, Mutex};


guess whos back!

sakridge · 2019-05-14T23:01:23Z

@garious @rob-solana can you guys think of any cleaner way to do this?

garious · 2019-05-14T23:12:35Z

I'm not able to look at this closely at the moment, but are you aware of this? https://doc.rust-lang.org/std/pin/index.html

sakridge · 2019-05-14T23:24:42Z

@garious Hmm. Haven't seen it before, but I don't see a straightforward way to apply it here. Presumably you could return a Pin<Vec<T>> but then you can't do a push or anything on that vec which is a bit annoying.

rob-solana · 2019-05-14T23:31:26Z

core/src/banking_stage.rs

@@ -804,7 +804,9 @@ mod tests {
    #[test]
    fn test_banking_stage_entries_only() {
        solana_logger::setup();
-        let (genesis_block, mint_keypair) = create_genesis_block(10);
+        let (mut genesis_block, mint_keypair) = create_genesis_block(10);


what's this change for?

The allocator takes longer, so the test needed more time to avoid the first slot running out of ticks.

We may be able to rearrange some stuff so that the tick start happens closer to the processing of transactions.

can we pre-allocate some amount in recycler?

yea, I can try that.

Turns out I don't need it since I added a flag to only pin when necessary.

did that work out?

The flag works so I didn't need to modify the test. I didn't try pre-allocating in the recycler but I didn't see an easy way to do that either.

rob-solana · 2019-05-14T23:38:14Z

have we root-caused our prior OOM from recycler yet? I didn't think we had...

sakridge · 2019-05-15T00:29:52Z

@rob-solana No, but we have OOM issues now anyway from the regular allocator so I can't make it worse :)

rob-solana · 2019-05-15T17:24:19Z

I think you might be able to make it worse. Remember OOM in 20-30min?

sakridge · 2019-05-15T17:27:23Z

@rob-solana last time I ran it that was where we are at with a single-node setup today. It's just that we switch leaders so we don't notice it as much now.

sakridge · 2019-05-15T18:36:14Z

Also the blob recycler was I think the biggest problem. This change only needs to use the recycler for Packets and these signature offsets and output buffers. Just anything that goes to the GPU for sigverify.

rob-solana · 2019-05-15T21:43:29Z

as I recall, we removed the blob recycler and the leaks continued unabated

sakridge · 2019-05-15T22:46:37Z

Ok, well we have to test it out then. I thought of a clever way to add it to minimize change and also allow for easy one-line removal.

Bring back recyclers and pin offset buffers

* Add test for recycler and reduce the gc lock critical section * Add comments/tests to cuda_runtime

Otherwise buffered becomes millions of packets which causes memory issues for nodes and problems where forwarding all those packets can take 10s of seconds.

sakridge · 2019-06-25T03:25:33Z

core/src/banking_stage.rs

@@ -804,6 +804,9 @@ impl BankingStage {
        packet_indexes: Vec<usize>,
    ) {
        if !packet_indexes.is_empty() {
+            if unprocessed_packets.len() > 400 {


@pgarg66 I found this has much better memory characteristics for fullnodes with capping the buffered packets size, even with the original code. Any concerns?

core/src/packet.rs

Revert "send-transactions: optimize retry pool (solana-labs#31)" This reverts commit 0abfe27.

sakridge force-pushed the pin-packet-memory branch 3 times, most recently from d4e39a8 to 5904b62 Compare May 10, 2019 21:15

sakridge force-pushed the pin-packet-memory branch 10 times, most recently from 79991ff to ade4a8d Compare May 14, 2019 03:57

sakridge added the work in progress This isn't quite right yet label May 14, 2019

sakridge force-pushed the pin-packet-memory branch from ade4a8d to 7bcc327 Compare May 14, 2019 04:11

aeyakovenko reviewed May 14, 2019

View reviewed changes

core/src/recycler.rs

@@ -0,0 +1,42 @@

use std::sync::{Arc, Mutex};

Copy link

Member

aeyakovenko May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess whos back!

sakridge force-pushed the pin-packet-memory branch from 7bcc327 to c839a10 Compare May 14, 2019 22:01

rob-solana reviewed May 14, 2019

View reviewed changes

sakridge force-pushed the pin-packet-memory branch from c839a10 to db5ffc8 Compare May 16, 2019 17:16

sakridge force-pushed the pin-packet-memory branch 3 times, most recently from 2f38251 to 84f276a Compare May 29, 2019 21:30

sakridge force-pushed the pin-packet-memory branch from 84f276a to 6890dda Compare May 31, 2019 20:39

sakridge force-pushed the pin-packet-memory branch 6 times, most recently from d14d147 to 69e8821 Compare June 24, 2019 18:08

sakridge removed the work in progress This isn't quite right yet label Jun 25, 2019

sakridge and others added 10 commits June 24, 2019 20:17

Page-pin packet memory for cuda

dab8681

Bring back recyclers and pin offset buffers

Add packet recycler to streamer

be3a8dd

Add set_pinnable to sigverify vecs to pin them

95ea817

Add packets reset test

bad566e

* Add test for recycler and reduce the gc lock critical section * Add comments/tests to cuda_runtime

Add recycler to recv_blobs path.

21eff90

Add trace/names for debug and PacketsRecycler to bench-streamer

46b311d

Predict realloc and unpin beforehand.

8e2799d

Add helper to reserve and pin

c68aa4a

Cap buffered packets length

5e364de

Otherwise buffered becomes millions of packets which causes memory issues for nodes and problems where forwarding all those packets can take 10s of seconds.

Call cuda wrapper functions

69b0ae3

sakridge force-pushed the pin-packet-memory branch from 69e8821 to 69b0ae3 Compare June 25, 2019 03:19

sakridge commented Jun 25, 2019

View reviewed changes

sakridge requested a review from rob-solana June 25, 2019 03:26

rob-solana reviewed Jun 25, 2019

View reviewed changes

core/src/packet.rs Show resolved Hide resolved

sakridge merged commit fbea9d8 into solana-labs:master Jun 27, 2019

sakridge deleted the pin-packet-memory branch June 27, 2019 07:32

Kashuhackerone mentioned this pull request May 13, 2024

[Snyk] Fix for 2 vulnerabilities Kashuhackerone/solana#96

Open

adamlaska mentioned this pull request May 14, 2024

[Snyk] Fix for 2 vulnerabilities adamlaska/solana#141

Open

brooksprumo pushed a commit to brooksprumo/solana that referenced this pull request Jan 8, 2025

Revert "send-transactions: optimize retry pool" (solana-labs#4250)

6242f0f

Revert "send-transactions: optimize retry pool (solana-labs#31)" This reverts commit 0abfe27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page-pin packet memory for cuda #4250

Page-pin packet memory for cuda #4250

sakridge commented May 10, 2019 •

edited

Loading

codecov bot commented May 10, 2019 •

edited

Loading

aeyakovenko May 14, 2019

sakridge commented May 14, 2019

garious commented May 14, 2019

sakridge commented May 14, 2019 •

edited

Loading

rob-solana May 14, 2019

sakridge May 14, 2019

sakridge May 14, 2019

rob-solana May 14, 2019

sakridge May 15, 2019

sakridge May 17, 2019

rob-solana May 18, 2019

sakridge May 18, 2019

rob-solana commented May 14, 2019

sakridge commented May 15, 2019

rob-solana commented May 15, 2019

sakridge commented May 15, 2019

sakridge commented May 15, 2019

rob-solana commented May 15, 2019

sakridge commented May 15, 2019

sakridge Jun 25, 2019

Page-pin packet memory for cuda #4250

Page-pin packet memory for cuda #4250

Conversation

sakridge commented May 10, 2019 • edited Loading

Problem

Summary of Changes

codecov bot commented May 10, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

sakridge commented May 14, 2019

garious commented May 14, 2019

sakridge commented May 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rob-solana commented May 14, 2019

sakridge commented May 15, 2019

rob-solana commented May 15, 2019

sakridge commented May 15, 2019

sakridge commented May 15, 2019

rob-solana commented May 15, 2019

sakridge commented May 15, 2019

Choose a reason for hiding this comment

sakridge commented May 10, 2019 •

edited

Loading

codecov bot commented May 10, 2019 •

edited

Loading

sakridge commented May 14, 2019 •

edited

Loading