Reduce num default tvu threads from 8 to 1 #5134
Open
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refresh of #998
Problem
We currently create 8 threads that solely try to pull packets out from sockets associated with the turbine port. Multiple threads were added to mitigate buffer receive errors. With improvements in the software including the use of recvmmsg, using 8 threads is overkill and we can read from this port plenty fast with a single thread
Summary of Changes
The value was already configurable with a hidden CLI arg; simply decrease the default from 8 to 1 now:
agave/validator/src/cli/thread_args.rs
Line 308 in 18b49da
Testing
For a basic sanity check, I ran
bench-streamer
. With the default settings of 4 producers / 1 receiver, I see that the receiver can pull > 900k packets / second.With this known, I then setup my node to generate additional load to itself on the TVU port. Since we're only exercising the ability for our node to pull packets out of the socket buffer, I crafted the packets such that the shred sigverify pipeline would throw the packets out prior to doing an actual sigverify. The below graph shows the following:
shred_sigverify.num_packets
- I divided by two to get packets / second (2 second metric interval)shred_sigverify.num_discards_pre
- divided by two againnet-stats-validator.rcvbuf_errors_delta
- I multiplied by 100kSo, my node is receiving ~375k packets per second at this port with 0 dropped packets. The max number of unique shreds per second can be derive from the max number of shreds per block:
My guess is that the node can handle higher too; I'll push it a bit more tomorrow. Lastly, it should be noted that I'm doing the load gen on the same machine, so the load gen is "stealing resource" from validator in some sense.
Performance Gains
1 thread instead of 8 is obviously a win if we keep performance flat; however, improving perf is an added win.
Shred Sigverify
At the top of the funnel, I see a reduction in mean
shred_sigverify.elapsed_micros
:2025/03/04 06:00
The blue node was spending less time here before the purple node got this branch; now the purple node is comparable to blue. The blue node serves as a nice comparison, but in raw numbers, this looks like a 15-20% improvement with this branch.
Shred Insertion
I see a drop in total amount of time spent in shred insertion; this is because we're now calling

Blockstore::insert_shreds()
fewer times (with more shreds per cal) so paying the cost of the overhead lessMy node is unstaked so not sending shreds to anyone, but might be some gains there. Also, some other minor residual gains like in
WindowService
but that is shrinking an already small number