-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate tapir's performance overhead #2636
Comments
@thereisnospoon Thanks a lot for the detailed investigation, very helpful! Would be definitely great to shave off some of that latency - maybe there are some opportunities around request decoding, but that's just a guess, we'll have to profile first. |
@thereisnospoon Ha that's a nice surprise :) Doesn't mean we can't be even better ;) Thanks again for the use-case! |
@adamw As we discussed today, we'd like to work some more on Tapir performance tests. Before specifying concrete scenarios that are important in the first place, let's make sure we agree on base goals. My proposition:
|
@kciesielski I think that's a very good high-level plan. One adjustment:
|
|
@adamw A proposition of scenarios/backends to start with: Simple GET latency Current test
Simple raw input latency (small input) Dummy POST endpoints with String/ByteArray input.
Simple raw input latency (5MB input) Similar to previous test, but with larger input to see what's the overhead of Tapir putting chunks together in some servers.
Raw File input latency Dummy POST endpoints with raw File input.
Websockets latency Based on https://github.com/kamilkloch/websocket-benchmark/tree/master
|
Looks good :) Simple GET latency -> this would also involve comparing to no-tapir setups? http4s and pekko are easy, netty - probably not so much? Simple raw input latency -> what would be the endpoint definition - reading the entire request into memory (as a string?), or using some kind of streaming (reactive / input stream)? Raw File input latency -> not sure if the file isn't too small to measure the overhead of reading & writing multiple chunks? Websockets latency -> 500ms sounds like a no-brainer - if we're slower than that, then we've got a serious problem ;). Maybe some kind of continuous transmission, or a skewed ping-pong, where our server sends e.g. 100 messages, and waits for a reply, and we see how many roundtrips we manage to make? |
One more test that would be good to have, is including various interceptors (exception, logging, metrics) - that would help resolve #3272 |
Yes, the no-tapir setups are in the current perf test set, so we should compare them. It's indeed not easy to build a comprehensive no-tapir netty server. One more server we could add is zio-http, which is netty-based.
I thought about using a raw string our byte array, which, in case of netty backends, will be built from a reactive stream that reads input in chunks to create the full final request body in memory.
Chunk size is 8192 by default as well, at least for our reactive streams, fs2 and zio streams, so 512kB sounds good enough.
Sounds good.
Yes, thanks for mentioning that, we definitely should add this as well. |
Probably not much sense investing too much into pure-netty server :) Not sure if it makes sense to compare w/ zio-http - it's in a lot of flux, so not sure how polished it is. Maybe pekko-http will provide a good enought baseline? As a server, it's rather fast :) Otherwise, we might look at vertx - it's one of the fastest servers out there AFAIK.
Ah ok :) So we would test with small inputs (1 chunk), and large inputs (~60 chunks) - both for file & string/byte array? Still not sure if 60 chunks will exhibit any significant overhead that might be there. |
Sounds good. Also, good to know that vertx is fast, I'll check it too it then.
Maybe we should use much larger files, like hunderds of chunks, but with less requests and shorter time, so we don't run out of disk space in seconds? For example: 640 chunks = 5MB, 20 concurrent users, 512 requests per user, which gives 50 GB. |
Hm ... well it's also a question, do we want to test the server under load (many concurrent requests), or the latency of a single request. I think if you want to look at the overhead of our stream processing logic, looking at a single request would be more informative - you isolate the aspect you want to test more. This might also be true for the other tests, as tapir code isn't really involved in concurrency (but only "adds overhead" to sequential processing of a request into a response). One aspect that might have impact under high load, that we won't be able to measure looking at single requests, is increased memory pressure - how much garbage we produce, and how much additional latency collecting this garbage creates |
Let's see, we could achieve this by parameterizing scnearios with concurrent user count. Run with |
True, if we have a test harness where we can easily parametrize these tests, run and look at results, why not run both :) |
First part of investigation https://softwaremill.com/benchmarking-tapir-part-1/ |
Second part: https://softwaremill.com/benchmarking-tapir-part-2/. I'm closing this issue, as the investigation part is pretty much done. |
In addition to the throughput tests we've conducted using akka-http & http4s (in
perf-tests
), inspect how using tapir influences memory & CPU usage, as compared to using these servers directly. Also, take a look at latency under sustained load at a fixed req/s rate.Source: https://twitter.com/aplokhotnyuk/status/1603342821594890247
The text was updated successfully, but these errors were encountered: