-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP2: Performance issues when many concurrent requests #35184
Comments
Tagging subscribers to this area: @dotnet/ncl |
Are you benchmarking against Kestrel? I suspect one large contributor to this is how we process frames sequentially: it does not max out multi-threaded CPUs very well. I've been thinking about a better buffering strategy to enable parallelism, but haven't had time to prototype it yet. I also think our HPACK decoder can be optimized. It takes up the bulk of the CPU usage for the basic protocol parsing. |
Yes it is against Kestrel. Out of curiosity I've just added a Grpc.Core server to see what changes. The source code is here if you're curious about any of the implementation: https://github.com/JamesNK/Http2Perf HttpClient, 100 callers, 1 connection: CCore client, 100 callers, 1 connection: On the server-side Kestrel is out performing CCore server. It doesn't make much difference when HttpClient is the client, an indication that HttpClient is the throughput bottleneck. When CCore client is used you can see a 50% RPS increase when switching from CCore server to Kestrel. |
To expand a little on what I said previously,
Our windowing algorithm is also very latency-sensitive, but will primarily exhibit issues only when downloading more data than fits into the window. I don't think this will effect perf of small requests. |
@JamesNK how close does Stephen's PR get us to reasonable perf? What is the priority of pushing perf further? Are there any customers / benchmarks demonstrating we have significant deficiencies? |
@karelz we have an Azure service that is affected by this perf degradation. I'll be happy to chat offline to describe our scenario as well as try to validate the fix. |
I haven't had a chance to measure it yet. I am improving the gRPC benchmarks to capture more client data, and adding a golang gRPC client to compare against. That will give us two points of reference. I aim to provide new numbers next week. When I've seen Stephen's improvements flow through to nightly builds I'll ask the customers who have raised client per issues (one is @stankovski) to retest. Right now they are considering going with Grpc.Core solely because of the perf difference. Priority wise, I'd like the .NET gRPC client to be competitive with other gRPC client libraries. Client perf is important in microservice scenarios because it is common to have one caller rather than one thousand, and client perf can be the RPS bottle neck. We don't have to be the fastest client, but I don't want to be the slowest. Having a second point of reference will give us more information on where we stand. |
Great, let's see where we stand with the latest fix and then we can decide how much more we need to invest in .NET 5. |
I have some results from running on my computer with the latest nightly SDK. There is large improvement in using HttpClientHandler directly, but something about how Grpc.Net.Client uses HttpClientHandler causes a significant performance drop. Note that the benchmark now references a nightly package of Grpc.Net.Client. The nightly package uses HttpHandlerInvoker rather than HttpClient. Grpc.Core:
HttpClientHandler:
Grpc.Net.Client:
My guess is Grpc.Net.Client reads from a stream, while the raw HttpClientHandler scenario gets its response data as a |
I have narrowed the difference in performance to how Grpc.Net.Client sends the request message. The "raw" benchmark uses a Request content: Command line to test:
I think there are issues with custom |
Did Grpc.Net.Client get worse compared to the previous build you were using, or it just didn't improve like the HttpClientHandler used directly? |
The performance drop I refered to was the switch from HttpClientHandler+ByteArrayContent to Grpc.Net.Client. Your PR did improve Gprc.Net.Client perf: Before:
After
|
Phew.
What happens if you comment out the |
Results on latest nightly. Custom content with FlushAsync
Custom content without FlushAsync
ByteArrayContent
I recall having to add the FlushAsync here during 3.0 so that request data was sent. Is it no longer necessary in 5.0, or did that behavior change before 3.0 went GA? |
A flush is needed after an individual message on a duplex request, as otherwise the message may just sit in the buffer forever until something else triggers a flush. But there's always a flush issued during or soon after sending an EndStream HTTP/2 frame, so there's no explicit flush required at the end of content's SerializeToStreamAsync. Essentially, we always flush at the end of a request content; if you need flushes before then, you need to request them explicitly. |
Can we influence the APIs like CodedOutputStream? There are non-trivial inefficiencies stemming from having to use the APIs as designed. My guess is that's the root of the difference with ByteArrayContent, in particular that it ends up driving a need for two writes instead of one. |
Yes! In fact I'm working with the Protobuf team to add When both of these PRs are merged, and code generation is updated to use them, then on the server we'll be able to read Protobuf directly from ASP.NET Core's request pipe, and write directly to the response pipe. HttpClient has streams instead of pipes, so things won't be quite as efficient. However the extra abstraction of reading from |
Happy to see that change go in finally 😬 |
Grpc.Net.Client PR - grpc/grpc-dotnet#901 Grpc.Net.Client with PR:
Raw HttpClientHandler:
I believe the gap is now down to the extra features that Grpc.Net.Client adds (call status, cancellation, tracing, strongly typed API) It is much smaller, but there remains a gap between Grpc.Net.Client (53k RPS) and Grpc.Core (73k RPS). Will wait and see how the Go client performs. |
Thanks, @JamesNK. And just to confirm, these numbers are with server GC enabled for the client app? |
Yes, the client app is using server GC. |
Hey @JamesNK ! Each RPS number is an average of 3 minutes of runtime with 0 errors. Also there is a warm up before the actual test is run |
Bunch of improvements in the space happened (some just recently) -- we think this is "it" for 5.0, moving to Future for potential further improvements. |
HttpClient has performance issues when many concurrent calls are made on one connection. HttpClient is slower compared to Grpc.Core (a gRPC client that uses chttp2 native C library for HTTP/2).
Test app: https://github.com/JamesNK/Http2Perf
I have replicated a gRPC call being made using HttpClient to avoid complication from involving Grpc.Net.Client.
.NET version:
HttpClient results:
dotnet run -c Release -p GrpcSampleClient r 100 false
): 19k RPS, 1-6ms latencydotnet run -c Release -p GrpcSampleClient r 100 true
): 31k RPS, 1-8ms latencyGrpc.Core results:
dotnet run -c Release -p GrpcSampleClient c 100 false
): 59k RPS, 1-2ms latencydotnet run -c Release -p GrpcSampleClient c 100 true
): 45k RPS, 1-2ms latencyInteresting that a connection per caller increases performance with HttpClient but decreases it with Grpc.Core.
With one caller HttpClient is faster than Grpc.Core (about 3k RPS vs 3.5k RPS) but as you can see in a 100 concurrent caller scenario Grpc.Core is three times faster.
The text was updated successfully, but these errors were encountered: