Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hooks API to track parameters and sizes #1601

Closed
s-kanev opened this issue Apr 26, 2019 · 8 comments
Closed

Hooks API to track parameters and sizes #1601

s-kanev opened this issue Apr 26, 2019 · 8 comments

Comments

@s-kanev
Copy link

s-kanev commented Apr 26, 2019

At Google, with other compression algorithms, we've found it useful to be able to cheaply gather telemetry for compression uasge -- things like compressed / decompressed sizes and algorithm settings (e.g. compression level, window size). Aggregating these fleetwide (in a way similar to CPU or heap profiles) lets us look for optimization opportunities like spurious compression.

I'm proposing adding a simple hooks API to zstd that lets one do that. It basically amounts to users registering a callback that gets called at the end of (de)compressing every block. Perhaps with a signature similar to the one in snappy:

typedef void (*ZSTD_hook) (const char* algorithm, size_t uncompressedSize, size_t compressedSize);

where algorithm is a short string like "zstd:3,15" (compression level, log window).
Of course, when no callback is registered, the overhead should be ~0, even for short compression calls.

Any objections to adding something like that?

If not, we can talk about what the actual API could look like, and how would users register a callback. (I have a simple prototype, but wouldn't pretend to know about all use cases of zstd that could be affected).

@Cyan4973
Copy link
Contributor

Thanks for the suggestion @s-kanev , I think it's a great feature !

Now, we have to be cautious on its implementation.
Specifically, using callback might restrict this capability to the C and C++ worlds,
because there is no single guaranteed ABI contract for a function call.
I'm not sure yet if there is any similar or better way available.
But at least we should give it some thoughts.

@terrelln
Copy link
Contributor

I'd be happy to add a hook like this. What kind of info do you log in your hooks? What have you found the most useful? At minimum I would want to log binary name, zstd version, and a stack trace.

  • I guess we want this to be global, and not per-context, since setting per-context values would be very hard for a profiler. That means we'll need cross-platform macros to do atomic reads/writes.
  • Constructing the algorithm name will be fast, but not free. So I'd like to hide the parameter construction behind an atomic bool that only gets set to true when profiling is enabled.
  • I'm okay starting by adding dedicated functions for this profiling in the experimental section for the first iteration. But I think the final iteration should look something like mallocctl.
  • Similarly I'm fine with the prototype you have for now, since it is simple. In a final iteration we may want to return something like JSON so we can add fields without breaking existing profiling.

Specifically, using callback might restrict this capability to the C and C++ worlds,
because there is no single guaranteed ABI contract for a function call.

I'm not too worried about this. Since it is just profiling, not a necessary feature. Someone could write, or we could always provide, a callback implementation that tails the stream in a circular-buffer, and allows users to call a function to read the buffer.

@s-kanev
Copy link
Author

s-kanev commented Apr 26, 2019

Thanks for your thoughts, replies inline.

What kind of info do you log in your hooks? What have you found the most useful?
At minimum I would want to log binary name, zstd version, and a stack trace.

A stack trace is definitely the most useful bit for us. The sizes are a very close second. Usually the lowest hanging fruit to optimize is glaringly obvious cases, like: "we're compressing 5MB buffers (from the size) of already compressed (from the callstack) data".
We have different channels to log things like binary IDs and versions, so we don't tie that to compression profiling specifically.

  • I guess we want this to be global, and not per-context, since setting per-context values would be very hard for a profiler. That means we'll need cross-platform macros to do atomic reads/writes.

I went back and forth on this a couple of times as well. On some non-x86 platroms, atomics are more expensive, so doing an atomic read on every block might add up. My end solution was to add the callback pointer to the context, and only do the atomic read to set it from the user-registered global on context init (which should be a bit rarer).

Also, I'm not sure how important supporting legacy compilers is, but I recently learned that C11 added cross-platform atomics. It's already 8 years old now, so it might be ok falling back to that?

  • Constructing the algorithm name will be fast, but not free. So I'd like to hide the parameter construction behind an atomic bool that only gets set to true when profiling is enabled.

Agree. While testing, I found that blindly calling snprintf is about a 20% regression for small sizes (~200 bytes). We can specialize a little, and with PGO, the compiler actually generates pretty efficient code (turns the short format string into just storing a constant) that I couldn't discern from the noise on that small test. But hiding behind a flag is much simpler.

  • Similarly I'm fine with the prototype you have for now, since it is simple.

Sounds good. I'll send that out once I get cleared to sign a CLA.

Note that it's only the hook plumbing, not the particular hook implementation we have. That's too tied to our internal deployment model and infrastructure to be broadly useful. For the record, it just collects the hook parameters, captures a stacktrace, deduplicates all samples in a profile proto, and exposes that through RPC.

I suspect for most users, consuming profiles will be similarly tied to their deployment models (writing to a file vs exposing through an RPC; JSON vs Trift vs protobuf; etc). Maybe we can provide a very simple example in the library here without taking extra dependencies -- something that de-duplicates samples and just prints them out(?) -- and then let users customize to their needs?

@terrelln
Copy link
Contributor

Sounds good. I'll send that out once I get cleared to sign a CLA.

Awesome!

I suspect for most users, consuming profiles will be similarly tied to their deployment models (writing to a file vs exposing through an RPC; JSON vs Trift vs protobuf; etc). Maybe we can provide a very simple example in the library here without taking extra dependencies -- something that de-duplicates samples and just prints them out(?) -- and then let users customize to their needs?

Yeah I agree. Adding an example to examples/ that prints it out sounds great!

@terrelln
Copy link
Contributor

terrelln commented Feb 5, 2021

@s-kanev FYI I'm adding some tracing functionality in PR #2482. It is hooked up via weak functions instead of function pointers. Please let me know if this works for you.

As I'm getting this set up and working the kinks out, I expect that the API will change between versions.

@s-kanev
Copy link
Author

s-kanev commented Feb 11, 2021

Thanks for following up. Yes, weak functions shouldn't be a problem.

Have you had a chance to look at overheads? Nothing in the tracing hook code looked particularly expensive, but it's good to double-check, especially at small sizes.

@terrelln
Copy link
Contributor

@s-kanev I'm working on hooking it up right now. I'll let you know about any overhead I measure when I get things hooked up, but I don't expect any significant overhead.

When it is disabled the overhead should be 2 function calls + 1 branch + the cost to determine whether or not to enable tracing, which should be negligible.

We keep a thread-local nextSample counter which is initialized to a geometric distribution according to our sample rate, and enable tracing when it hits zero.

@felixhandte
Copy link
Contributor

I believe this is complete. Re-open if there's remaining desired functionality not covered by #2482.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants