-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hooks API to track parameters and sizes #1601
Comments
Thanks for the suggestion @s-kanev , I think it's a great feature ! Now, we have to be cautious on its implementation. |
I'd be happy to add a hook like this. What kind of info do you log in your hooks? What have you found the most useful? At minimum I would want to log binary name, zstd version, and a stack trace.
I'm not too worried about this. Since it is just profiling, not a necessary feature. Someone could write, or we could always provide, a callback implementation that tails the stream in a circular-buffer, and allows users to call a function to read the buffer. |
Thanks for your thoughts, replies inline.
A stack trace is definitely the most useful bit for us. The sizes are a very close second. Usually the lowest hanging fruit to optimize is glaringly obvious cases, like: "we're compressing 5MB buffers (from the size) of already compressed (from the callstack) data".
I went back and forth on this a couple of times as well. On some non-x86 platroms, atomics are more expensive, so doing an atomic read on every block might add up. My end solution was to add the callback pointer to the context, and only do the atomic read to set it from the user-registered global on context init (which should be a bit rarer). Also, I'm not sure how important supporting legacy compilers is, but I recently learned that C11 added cross-platform atomics. It's already 8 years old now, so it might be ok falling back to that?
Agree. While testing, I found that blindly calling snprintf is about a 20% regression for small sizes (~200 bytes). We can specialize a little, and with PGO, the compiler actually generates pretty efficient code (turns the short format string into just storing a constant) that I couldn't discern from the noise on that small test. But hiding behind a flag is much simpler.
Sounds good. I'll send that out once I get cleared to sign a CLA. Note that it's only the hook plumbing, not the particular hook implementation we have. That's too tied to our internal deployment model and infrastructure to be broadly useful. For the record, it just collects the hook parameters, captures a stacktrace, deduplicates all samples in a profile proto, and exposes that through RPC. I suspect for most users, consuming profiles will be similarly tied to their deployment models (writing to a file vs exposing through an RPC; JSON vs Trift vs protobuf; etc). Maybe we can provide a very simple example in the library here without taking extra dependencies -- something that de-duplicates samples and just prints them out(?) -- and then let users customize to their needs? |
Awesome!
Yeah I agree. Adding an example to |
Thanks for following up. Yes, weak functions shouldn't be a problem. Have you had a chance to look at overheads? Nothing in the tracing hook code looked particularly expensive, but it's good to double-check, especially at small sizes. |
@s-kanev I'm working on hooking it up right now. I'll let you know about any overhead I measure when I get things hooked up, but I don't expect any significant overhead. When it is disabled the overhead should be 2 function calls + 1 branch + the cost to determine whether or not to enable tracing, which should be negligible. We keep a thread-local |
I believe this is complete. Re-open if there's remaining desired functionality not covered by #2482. |
At Google, with other compression algorithms, we've found it useful to be able to cheaply gather telemetry for compression uasge -- things like compressed / decompressed sizes and algorithm settings (e.g. compression level, window size). Aggregating these fleetwide (in a way similar to CPU or heap profiles) lets us look for optimization opportunities like spurious compression.
I'm proposing adding a simple hooks API to zstd that lets one do that. It basically amounts to users registering a callback that gets called at the end of (de)compressing every block. Perhaps with a signature similar to the one in snappy:
where
algorithm
is a short string like "zstd:3,15" (compression level, log window).Of course, when no callback is registered, the overhead should be ~0, even for short compression calls.
Any objections to adding something like that?
If not, we can talk about what the actual API could look like, and how would users register a callback. (I have a simple prototype, but wouldn't pretend to know about all use cases of zstd that could be affected).
The text was updated successfully, but these errors were encountered: