Skip to content

Record HTTP server metrics in instrumenter API. #2877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 12, 2021

Conversation

anuraaga
Copy link
Contributor

Initial draft of recording metrics from Instrumenter. Let me know if you have any starter thoughts on this approach.

I found recording the metrics using the metrics API to be pretty annoying, especially having different set of labels for different metrics. I'm hoping the metrics conventions are moving in a direction of recording all the attributes, and letting the SDK deal with such details. That being said, it's still nice that all this is hidden from instrumentation authors, they just have to enable the metrics when constructing.

Anuraag Agrawal added 2 commits April 27, 2021 17:50

Context start(Context context, Attributes requestAttributes);

void end(Context context, Attributes responseAttributes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is metrics will only ever be recorded with "request attributes" since metric attributes (labels) must be constant throughout a request.

But some metrics will need response attributes to get the actual value, for example, a "response content length" metric. In other words, some span attributes (usually response attributes) are actually metric values.

import io.opentelemetry.api.metrics.Meter;

@FunctionalInterface
public interface RequestMetricsFactory {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly this class could be called RequestMetrics and the other RequestListener

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we ever want to implement this comment: we could encapsulate all span operations in a RequestListener

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems like a nice idea

activeRequests =
meter
.longUpDownCounterBuilder("http.server.active_requests")
.setUnit("requests")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think the semantic conventions need to have units in them as well, so we won't have to hard-code this stuff. Also, even though this is currently the semantic convention for these units, I suspect this will not last, as we will probably be standardizing on units in the future. See this issue for tracking: open-telemetry/opentelemetry-specification#705

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the generator should provide them as well. Are they in the yaml yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it - nope there aren't any metric yaml files yet

Comment on lines 106 to 113
case "http.method":
case "http.host":
case "http.scheme":
case "http.status_code":
case "http.flavor":
case "http.server_name":
case "net.host.name":
case "net.host.port":
Copy link
Contributor

@richardstartin richardstartin Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about OTel's approach to high cardinality metric attributes, having implemented similar functionality at Datadog and paid a lot of attention to cardinality reduction prior to aggregation. I have a few questions which I hope are not disruptive to your review.

  • I'm unfamiliar with OTel attributes but which of these, if any, is the request path?
  • Since the unsanitised request path can have unbounded cardinality, would that rule it out as a metric attribute?
  • If this were a metric for a database query, would you sanitise the query here, or exclude it from the labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reaching out @richardstartin. I doubt I'll have good answers as lots about OTel's approach to metrics is WIP but let's see ;)

I'm unfamiliar with OTel attributes but which of these, if any, is the request path?

It's not in this list yet. There is a special note here

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/http-metrics.md#parameterized-labels

and I didn't want to deal with it yet :) In particular, it seems they are not very up to date as http.route is really the parameter to be using (it's the parameterized path)

Since the unsanitised request path can have unbounded cardinality, would that rule it out as a metric attribute?

I think so, it would have to be a parameterized path of some sort. That being said, given the separation between API and SDK in OTel, I wonder if it's possible to pass raw paths in API and in the SDK either detect parameters in the path using some cool logic, or detect the cardinality and drop the attributes. Without that, it's definitely not appropriate. I wonder if your "cardinality reduction prior to aggregation" would be something we can incorporate in the SDK.

If this were a metric for a database query, would you sanitise the query here, or exclude it from the labels?

If there were conventions for DB metrics I'd just follow them

https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/metrics/semantic_conventions

Lacking them, just from my sense, I've never added queries as metric labels. But I don't see a problem with adding prepared statements as they generally have low cardinality (10-20 different types of queries in an app, each with the same prepared form across invocations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answers. We track metrics at a much more granular level than here - so including the sanitised request path, sanitised database query, and so on - we call this the resource name at Datadog. This is because latency distributions tend to depend on attributes like the query or the endpoint - e.g. database updates and inserts tend to perform differently, and the application's business logic is always slower than the health check endpoint.

I will be interested to see the direction this API goes in - will higher cardinality attributes be sacrificed altogether to keep the number of points down, or will it find a way to share the sanitisation required to avoid the number of metric points exploding with tracing to reduce total overhead and promote consistency between metrics and tracing?

import io.opentelemetry.api.metrics.Meter;

@FunctionalInterface
public interface RequestMetricsFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we ever want to implement this comment: we could encapsulate all span operations in a RequestListener

return;
}
activeRequests.add(-1, state.activeRequestLabels());
duration.record(System.nanoTime() - state.startTimeNanos(), state.durationLabels());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric's unit is milliseconds, we probably should convert nanos to millis

LabelsBuilder labels = Labels.builder();
attributes.forEach(
(key, value) -> {
if (key.getType() != AttributeType.STRING) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http.status_code and net.host.port are longs, not strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - status code isn't ready anyways when I compute these labels at the beginning of the request. I guess depending on the metric, some need to be computed at beginning (active requests) and some at end (duration). But since there's no text in the spec on that yet I'll keep it as is for now.

Comment on lines 77 to 78
LabelsBuilder labels = Labels.builder();
attributes.forEach(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if the SDK included some easy Attributes<->Labels conversion. Or if we could use Attributes directly (well, a subset of them) in metrics API...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I'm hoping for.

Anuraag Agrawal added 2 commits May 7, 2021 13:13
Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started going through comments will continue to clean this PR up

Comment on lines 77 to 78
LabelsBuilder labels = Labels.builder();
attributes.forEach(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I'm hoping for.

LabelsBuilder labels = Labels.builder();
attributes.forEach(
(key, value) -> {
if (key.getType() != AttributeType.STRING) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - status code isn't ready anyways when I compute these labels at the beginning of the request. I guess depending on the metric, some need to be computed at beginning (active requests) and some at end (duration). But since there's no text in the spec on that yet I'll keep it as is for now.

import io.opentelemetry.api.metrics.Meter;

@FunctionalInterface
public interface RequestMetricsFactory {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems like a nice idea

@anuraaga anuraaga marked this pull request as ready for review May 10, 2021 06:25
@anuraaga
Copy link
Contributor Author

Thanks for waiting think this is ready for review

Anuraag Agrawal added 2 commits May 11, 2021 12:53
@@ -74,6 +81,12 @@
return addAttributesExtractors(Arrays.asList(attributesExtractors));
}

/** Adds a {@link RequestMetrics} whose metrics will be recorded for request start and stop. */
public InstrumenterBuilder<REQUEST, RESPONSE> addRequestMetrics(RequestMetrics factory) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be marked as unstable?

Suggested change
public InstrumenterBuilder<REQUEST, RESPONSE> addRequestMetrics(RequestMetrics factory) {
@UnstableApi
public InstrumenterBuilder<REQUEST, RESPONSE> addRequestMetrics(RequestMetrics factory) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I was happy enough having just written the docs on the annotation...

* <p>To use this class, you may need to add the {@code opentelemetry-api-metrics} artifact to your
* dependencies.
*/
public final class HttpServerMetrics implements RequestListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, shouldn't we mark this class as unstable?

Suggested change
public final class HttpServerMetrics implements RequestListener {
@UnstableApi
public final class HttpServerMetrics implements RequestListener {

@@ -14,4 +14,5 @@

public static final Interceptor TRACING_INTERCEPTOR =
OkHttpTracing.create(GlobalOpenTelemetry.get()).newInterceptor();
String a = Byte.toString((byte) 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Comment on lines 24 to 27
Context start(Context context, Attributes attributes);

/** Listener method that is called at the end of a request. */
void end(Context context, Attributes attributes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that end() only receives attributes extracted from the response, without the request ones? This is not immediately obvious when you look at this interface.

Suggested change
Context start(Context context, Attributes attributes);
/** Listener method that is called at the end of a request. */
void end(Context context, Attributes attributes);
Context start(Context context, Attributes requestAttributes);
/** Listener method that is called at the end of a request. */
void end(Context context, Attributes responseAttributes);

Anuraag Agrawal added 2 commits May 12, 2021 13:07
@anuraaga anuraaga merged commit 057d886 into open-telemetry:main May 12, 2021
@trask
Copy link
Member

trask commented May 12, 2021

we have (some) metrics 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants