Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Exemplar support for OpenTelemetry tracing #652

Merged
merged 2 commits into from
May 24, 2021
Merged

Conversation

fstab
Copy link
Member

@fstab fstab commented Apr 12, 2021

New Section in README:

Exemplars are a feature of the OpenMetrics format that allows applications to link metrics
to example traces. Starting with version 0.11.0, client_java automatically adds Exemplars for applications that
are instrumented with OpenTelemetry distributed tracing. No code change is required.

Running an Exemplars Example

If you want to see this in action, you can run the example from the ExemplarsClientJavaIT:

mvn package
cd integration_tests/exemplars_otel_agent/target/
curl -LO https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.0.1/opentelemetry-javaagent-all.jar
java -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -javaagent:./opentelemetry-javaagent-all.jar -jar ./sample-rest-application.jar

Now you have a Spring REST service running on http://localhost:8080/hello that is instrumented with the OpenTelemetry Java agent.

In order to get metrics in OpenMetrics format, run

curl -H 'Accept: application/openmetrics-text; version=1.0.0; charset=utf-8' http://localhost:8080/metrics

You should see metrics with Exemplars, for example in the request_duration_histogram metric:

request_duration_histogram_bucket{path="/god-of-fire",le="0.004"} 4.0 # {trace_id="043cd631811e373e4180a678c06b128e",span_id="cd122e457d2ca5b0"} 0.0033 1618261159.027

Note that this is an example application for a unit test, so durations don't represent real durations, and some example metrics might not make sense in the real world.

Enabling And Disabling Exemplars

By default, Exemplars are enabled if OpenTelemetry tracing is detected. You can disable this globally with:

ExemplarConfig.disableByDefault();

If you want to enable Exemplars only for a single metric, use withExemplarSampler() in the metric builder.

// This also works with Gauges, Histograms, and Summaries.
Counter labelsCustomExemplar = Counter.build()
    .name("number_of_events_total")
    .help("help")
    .withExemplarSampler(new MyExemplarSampler())
    ...
    .register();

Likewise, you can disable Exemplars for a single metric like this:

// This also works with Gauges, Histograms, and Summaries.
Counter labelsCustomExemplar = Counter.build()
    .name("number_of_events_total")
    .help("help")
    .withoutExemplars()
    ...
    .register();

See ExemplarTest for more examples.

Implement Your Own Exemplar Sampling Algorithm

The DefaultExemplarSampler is very simple: It samples a new Exemplar if the current one is older than 7 seconds.
You might want to implement your own Exemplar sampler that provides more interesting Exemplars for your application.
In order to do so, just implement the following interfaces:

  • CounterExemplarSampler
  • GaugeExemplarSampler
  • HistogramExemplarSampler
  • SummaryExemplarSampler

You can set your implementations as default like this:

ExemplarConfig.setDefaultCounterExemplarSampler(mySampler);
ExemplarConfig.setDefaultGaugeExemplarSampler(mySampler);
ExemplarConfig.setDefaultHistogramExemplarSampler(mySampler);
ExemplarConfig.setDefaultSummaryExemplarSampler(mySampler);

Implement Support for Other Tracing Vendors

Version 0.11.0 implements support for OpenTelemetry. If you are a vendor for another distributed tracer,
please create a pull request adding support for your system. The idea is that copy-and-paste the
tracer_otel module and modify it for your needs. If this turns out to be too simplistic, please create
a GitHub issue and let us know what you need.

Implementation Notes

This PR also introduces integration tests with testcontainers, which is a framework for running tests against Docker containers. Benefits:

  • This allows tests like the ExemplarsClientJavaIT where we attach the OpenTelemetry Java agent to the system under test.
  • This allows tests with different Java versions, like JavaVersionsIT that can be used to ensure the library is compatible with Java 6 or IBM's OpenJ9.

That means docker must be installed on the system where you run mvn verify.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few things here that don't jive with the OM spec.

README.md Outdated
In order to do so, just implement the following interfaces:

* `CounterExemplarSampler`
* `GaugeExemplarSampler`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gauges do not support exemplars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

README.md Outdated
* `CounterExemplarSampler`
* `GaugeExemplarSampler`
* `HistogramExemplarSampler`
* `SummaryExemplarSampler`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarys do not support exemplars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

private final double value;
private final long timestampMs;

public Exemplar(String traceId, String spanId, double value, long timestampMs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the data model of an examplar in OpenMetrics, they can have arbitrary labelsets as long as they're within the character limit. Timestamps are also optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/**
* Lazy value holder for a double value.
* <p/>
* Counters and Gauges track their current value using a DoubleAdder, so getting the value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you need the value of a metric when sampling exemplars and thus care that the implementation uses a DoubleAdder? Only the size of the observation itself should matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the CounterExemplarSampler to this

Exemplar sample(double increment, Value newTotalValue, Exemplar previous);

The default implementation uses the increment and ignores the newTotalValue. However, I still kept the newTotalValue available in the API so that users can access it if needed (e.g. start sampling after the total value exceeded a threshold).

static class SystemClock implements Clock {
@Override
public long currentTimeMillis() {
return System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the latency impact of calling this on every observation? This tends to be a relatively slow call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation so that this is only called if OpenTelemetry tracing is available.

For the latency, I added a ExemplarsBenchmark. Result on my local Linux laptop:

Benchmark                                               Mode  Samples   Score   Error  Units
i.p.b.ExemplarsBenchmark.testCounter                    avgt      200  28.252 ± 0.312  ns/op
i.p.b.ExemplarsBenchmark.testCounterWithExemplars       avgt      200  46.042 ± 0.781  ns/op
i.p.b.ExemplarsBenchmark.testCounterWithoutExemplars    avgt      200  29.322 ± 0.475  ns/op

so about 17ns for the currentTimeMillis() call.


writeTimestamp(writer, sample.timestampMs);
}
if (sample.exemplar != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not cover all potential valid exemplars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed with the new Exemplar data model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not quite right, 0 is a valid timestamp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, using Long now and null to indicate no timestamp.

/**
* Immutable data class holding an Exemplar.
*/
public class Exemplar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest having the definition of Exemplar itself live within simpleclient, as that's where all of the rest of the data model is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExemplarSampler implementations use the Exemplar data model, so if we move the model to simpleclient, we will need to move all ExemplarSampler implementations to simpleclient as well. Currently it's not much, but if in the future people provide more sophisticated implementations it might be worthwhile to have this in a separate module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the core data model should all live in one place, not spread across N modules that have to be pieced together by someone trying to work with this. It's kinda like having an entire module just to handle timestamps.

Personally I wouldn't have the core simpleclient depending on any other modules at all, rather make this something that the user has to add an extra module to enable, rather than silently creating the possibility for dependency hell on upgrade. simpleclient thus far has purposefully avoided any dependencies for that reason, it should always be safe to have simpleclient instrumentation in place.

This is also still not the data model of an OM exemplar, as timestamps are optional.

}
writer.write('\n');
}
}
writer.write("# EOF\n");
}

private static void writeTimestamp(Writer writer, long timestampMs) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omWriteTimestamp would avoid confusion, as this is OM specific logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@fstab fstab force-pushed the otel-exemplars branch 2 times, most recently from 0244904 to ef6e57f Compare April 18, 2021 22:54
@fstab
Copy link
Member Author

fstab commented Apr 18, 2021

Thanks a lot @brian-brazil for reviewing this, I appreciate it a lot. I pushed an update and I'd be happy if you could review it once more.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All public methods should have javadoc.

@@ -49,7 +49,7 @@
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.10.1-SNAPSHOT</version>
<version>${project.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break some build systems, I forget which one but that's the reason all versions are explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@@ -0,0 +1,80 @@
package io.prometheus.benchmark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme should be updated with the new numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd re-run all the benchmarks, so it's apples to apples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still stands.


/**
* @param increment the value added to the counter on this event
* @param newTotalValue the new total counter value after adding the increment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you care about this? Increments should be independent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Here's an example that might be a valid use case: A counter representing a buffer size. If the buffer can only grow, a counter should be a valid metric. A user might decide to sample exemplars only when the buffer has exceeded a certain size, thus the exemplar sampler needs the total value.

I'm not saying that this will be common, in most cases users won't need the total value. However, if we remove it from the API there will be no chance to use it even if there are rare cases where it makes sense.

Anyway, I don't have a strong opinion on this. If you still think removing it would be better I'm happy to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting thing about a counter is how quickly it goes up, the absolute value doesn't matter.

A counter representing a buffer size

That's a gauge, as you care about the absolute value.

/**
* Immutable data class holding an Exemplar.
*/
public class Exemplar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the core data model should all live in one place, not spread across N modules that have to be pieced together by someone trying to work with this. It's kinda like having an entire module just to handle timestamps.

Personally I wouldn't have the core simpleclient depending on any other modules at all, rather make this something that the user has to add an extra module to enable, rather than silently creating the possibility for dependency hell on upgrade. simpleclient thus far has purposefully avoided any dependencies for that reason, it should always be safe to have simpleclient instrumentation in place.

This is also still not the data model of an OM exemplar, as timestamps are optional.

* @param labels name/value pairs. Expecting an even number of strings. The combined length of the label names and
* values must not exceed 128 UTF-8 characters. Neither a label name nor a label value may be null.
*/
public Exemplar(double value, String... labels) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in labels should be consistent with how it's done in Info, so a map should also be supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the Exemplar and all classes that depend on it to simpleclient. Now only the bridges to the tracing libraries live in separate modules. I agree that it should be safe to have simpleclient instrumentation in place, and that dependencies should be avoided. I hope this is the case with the current implementation: simpleclient does not have any runtime dependencies. OpenTelemetry is available we will load the corresponding SpanContextSupplier, if not we won't.

We could disable exemplars by default and have the user explicitly enable them with a ExemplarConfig.enableExemplars() call. Apart from that I don't see how to improve this. We could load SpanContextSuppliers dynamically through a class loader, but that would yield to the same optional dependency.

I added an Exemplar constructor with a Map parameter to be consistent with Info.

instance.histogramExemplarSampler = instance.noopExemplarSampler;
}

// Methods for internal use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're internal, why are they public?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment. It makes sense to have these as public methods, because a user might want to keep a copy of the original default ExemplarSampler.

return timestampMs;
}

private void validateLabels(String... labels) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also check that the label names are valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to make sure label names are unique.


writeTimestamp(writer, sample.timestampMs);
}
if (sample.exemplar != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not quite right, 0 is a valid timestamp.

@fstab
Copy link
Member Author

fstab commented Apr 20, 2021

I pushed another update to make sure that simpleclient works even if none of the dependencies (tracer_common, tracer_otel, tracer_otel_agent) is present.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense for an exemplar to be provided directly at inc/observe time, as exemplars are not limited to spans/traces.

if (labels[i] == null) {
throw new IllegalArgumentException("labels[" + i + "] is null");
}
if (i % 2 == 0) { // label names should be unique
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test that the label names are valid.

Copy link
Member Author

@fstab fstab May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. What else would you test apart from not null and uniqueness?
  2. Do you see any drawbacks if we sort the exemplar labels by name? That would give them a defined order when the labels are passed as a HashMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the characters match the labelname regex.

Sorting is fine, but a HashMap doesn't have ordering. You'd want a TreeMap for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added sorting and the check for the label name regex.

@@ -0,0 +1,80 @@
package io.prometheus.benchmark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd re-run all the benchmarks, so it's apples to apples.

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>${otel.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment applies to all versions, avoid variables. Of course it's possible the feature has been added in the meantime, as that request was some years back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the maven wrapper script so that users have a defined version of maven (specified in .mvn/wrapper/maven-wrapper.properties) if they call ./mvnw instead of mvn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment stands, you can't assume that users are using maven.

@@ -0,0 +1,18 @@
package io.prometheus.smoketest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be under client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

pom.xml Outdated
@@ -43,6 +43,7 @@
</developers>

<modules>
<module>tracer_parent</module>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the naming scheme of beginning with simpleclient.

Copy link
Member Author

@fstab fstab May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to simpleclient_tracer


/**
* @param increment the value added to the counter on this event
* @param newTotalValue the new total counter value after adding the increment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here, it should be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I removed it

/**
* @param value the value to be observed.
* @param bucketFrom upper boundary of the previous bucket in the histogram.
* May be {@link Double#NEGATIVE_INFINITY} if there is no previous bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be

There's no uncertainty here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

README.md Outdated

### Enabling And Disabling Exemplars

By default, Exemplars are enabled if OpenTelemetry tracing is detected. You can disable this globally with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need updating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-wrote the exemplars documentation.

private static final ExemplarConfig instance = new ExemplarConfig();

private final ExemplarSampler noopExemplarSampler;
private final ExemplarSampler defaultExemplarSampler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be static? They aren't tied to an instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made them static.

}

/**
* Disable exemplars by default. Exemplars can still be enabled for individual metrics in the metric's builder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? By the time it is called, static initialisation of metrics will likely already be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following to README.md to clarify this

Note that ExemplarConfig is used when a metric is created, so changing the default has no effect on metrics that have already been created.

If we want API to change the ExemplarSampler for existing metrics, we'd need to make access to the ExemplarSampler thread safe, which would create unexpected synchronization in multi threaded user applications. I don't see a good way of doing this apart from requiring the user to call ExemplarConfig before metrics are created.

@fstab fstab force-pushed the otel-exemplars branch 2 times, most recently from 650aa32 to 64c7c2b Compare May 9, 2021 21:10
@fstab
Copy link
Member Author

fstab commented May 9, 2021

Thanks again for your review.

Apart from addressing your comments, I added ...withExemplar() versions for the Counter's inc() and the Histogram's observe() and time() methods. Now users can now control Exemplars on three levels:

  • Globally by setting the default exemplar sampler
  • Per metric by setting an exemplar sampler in the metric builder
  • Per observation by using the ...withExemplar() methods

I described this in README.md. I also split the documentation so that generic Exemplar topics are covered in README.md and OpenTelemetry specific topics in OTEL_EXEMPLARS.md.

I did not find the time yet to re-run all benchmarks, I will do that before the merge.

I tested all combinations of available / unavailable dependencies and all available OpenTelemetry Java versions. I didn't find any combination that breaks anything.

If the OpenTelemetry tracer is detected, Exemplars for OpenTelemetry are enabled by default. I added a few options to disable this

  • In code by calling ExemplarConfig.disableExemplarSamplers()
  • At build time by excluding the tracer_otel and tracer_otel_agent dependency
  • At runtime with system properties or environment variables.

I documented this in OTEL_EXEMPLARS.md.

I like the idea that if users attach the OpenTelemetry Java agent to get traces, they will also see Exemplars by default. And I don't see why people would use distributed tracing with OpenTelemetry without having corresponding exemplars in their metrics. However, if you think that we should not enable exemplars by default if OpenTelemetry tracing is present, I am happy to turn this around and disable OpenTelemetry exemplars by default.

@fstab fstab force-pushed the otel-exemplars branch from 64c7c2b to d8fbf66 Compare May 9, 2021 21:39
README.md Outdated
The following call will increment a counter, and create an exemplar with the specified `span_id` and `trace_id` labels:

```java
myCounter.incWithExemplar(Exemplar.SPAN_ID, "abcdef", Exemplar.TRACE_ID, "123456");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd include the literal string here, rather than pointing off to an unspecified class that you have to import to get to a constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TRACE_ID String from Exemplar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push? I'm not seeing a change here.

myHistogram.observeWithExemplar(0.12, Exemplar.SPAN_ID, "abcdef", Exemplar.TRACE_ID, "123456");
```

All methods for observing and incrementing values have `...withExemplar` equivalents. There are versions taking the exemplar labels as a `String...` as shown in the example, as well as versions taking the exemplar labels as a `Map<String, String>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about doing this as part of inc() call? This isn't Go, function overloading is possible and normal in Java.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot make it part of the inc() call if we want to use varargs for the labels: inc(3) adds 3 without explicitly specifying an exemplar, while incWithExemplar(3) adds 3 and creates an exemplar without labels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone wanted an explicitly empty exemplar, they could pass an empty map.

@@ -0,0 +1,80 @@
package io.prometheus.benchmark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still stands.

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>${otel.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment stands, you can't assume that users are using maven.

private final double value;
private final Long timestampMs;

private static final Pattern labelNameRegex = Pattern.compile("[a-zA-Z_]\\w*"); // \w is [a-zA-Z_0-9]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only confuses matters, if you want to say [a-zA-Z_0-9] then say [a-zA-Z_0-9] directly.

Copy link
Member Author

@fstab fstab May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Renamed io.prometheus.benchmark to io.prometheus.client.benchmark
  • Regarding Maven variables: I tested the current build with the oldest 3.0 and 3.1 versions available on the Apache mirror (3.0.5 from 2013-02-23 and 3.1.1 from 2013-10-04). I had to downgrade the failsafe plugin, but apart from that the build runs without issues with these versions. Does that address your concerns?
  • Changed the regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that address your concerns?

No, the problem was never maven. It was some other Java build tool that was reading the poms.

* or they can be provided explicitly for individual observations with the {@code ...withExemplar()} methods.
*/
public static void disableExemplarSamplers() {
enabled = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still racey, you don't know when this could be called relative to when Counters and Histograms are instantiated.

Copy link
Member Author

@fstab fstab May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a line to the JavaDoc saying that this must be called before a Counter or Histogram is created.

I'm not sure what the best trade-off is. From a usability perspective it would be better if this disabled Exemplars also for previously created Counter or Histogram metrics. However, in order to implement this we would need to make access to the exemplar sampler thread safe, which might severely slow down multi-threaded use of the metrics. Therefore I think the best option is to require the user to set the default exemplar behavior before creating metrics. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a line to the JavaDoc saying that this must be called before a Counter or Histogram is created.

That's not possible for users to do, as metrics are usually defined statically and thus users have no control over it.

However, in order to implement this we would need to make access to the exemplar sampler thread safe, which might severely slow down multi-threaded use of the metrics.

If you only want to read the current state of the sampler, that should always be fast as you don't need mutexes. It's only when mixing reads and writes that things can get a tad slow.

}
}

private void assertOldFormat(String line) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old and New here is a bit confusing, I'd suggest using their actual names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it.

metrics.contains("\n" + line));
}

private String timestampString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you creating a copy of this function in the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the copy and used the version from TextFormat.

@fstab fstab force-pushed the otel-exemplars branch from 3f389ae to 8ce5a53 Compare May 11, 2021 21:15
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to push.

README.md Outdated
The following call will increment a counter, and create an exemplar with the specified `span_id` and `trace_id` labels:

```java
myCounter.incWithExemplar(Exemplar.SPAN_ID, "abcdef", Exemplar.TRACE_ID, "123456");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push? I'm not seeing a change here.

@fstab
Copy link
Member Author

fstab commented May 12, 2021

Sorry, I removed the reference to Exemplar.SPAN_ID from all code occurrences, but missed the one in README.md.

If you only want to read the current state of the sampler, that should always be fast as you don't need mutexes. It's only when mixing reads and writes that things can get a tad slow.

Ok, then I'll refactor this such that you can change the exemplar sampler for existing metrics.

@fstab
Copy link
Member Author

fstab commented May 13, 2021

I pushed another update. Now changes in ExemplarConfig will be picked up by existing Counters and Histograms.
I also ran all benchmarks again, as this was still on my todo list.

Regarding the variables in pom.xml files I think it's fair to require users to use Maven or the ./mvnw script to build the project from source. I'm not aware of an alternative build tool that uses the pom.xml files. However, if you think there are users out there using such a tool I am happy to replace the variables with the version numbers.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks roughly right.

However, if you think there are users out there using such a tool I am happy to replace the variables with the version numbers.

As I said it's a complaint I got years ago, for which I forget the exact details. It's your call.

README.md Outdated

### Per Observation Exemplars

You can explicitly create an exemplar for an individual observation. This takes precedence over the exemplar sampler configured with the metric.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/create/provide/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -0,0 +1,10 @@
FROM openjdk:11-jre
RUN mkdir -p /app/
ADD https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.1.0/opentelemetry-javaagent-all.jar /app/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.2.0?

Copy link
Member Author

@fstab fstab May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the agent release lags a few days behind the library release. Agent is still 1.1.0.

<artifactId>simpleclient</artifactId>
<version>${project.version}</version>
<exclusions>
<!-- uncomment the following to verify manually that simpleclient still works if the tracers are excluded -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be another integration test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I turned this into an integration test. Instead of excluding dependencies in the pom.xml, the new implementation of ExemplarsOtelIT copies the individual JARs into the Docker image so that they can be removed in the test without rebuilding the Maven project.

While I was at it, I also renamed the tracer_... sub-modules to simpleclient_tracer_....

/**
* Disable exemplars.
* <p>
* This will disable exemplars for this counter even if exemplars are enabled by default in {@link ExemplarConfig}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as though it prevents all use of exemplars, whereas it's only the automatic ones that it stops. This could be clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the wording.

assert004Format("no_labels_default_exemplar_total 3.0\n");
noLabelsDefaultExemplar.inc(2);
// The TestExemplarSampler always produces a new Exemplar.
// The Exemplar with value 3.0 should be replaced with an Exemplar with value 5.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't right, the exemplar value is 2. It's the counter value that is 5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

i.p.b.CounterBenchmark.prometheusSimpleCounterNoLabelsIncBenchmark avgt 5 13.131 ± 1.661 ns/op
i.p.c.b.CounterBenchmark.codahaleCounterIncBenchmark avgt 5 8.063 ± 0.440 ns/op
i.p.c.b.CounterBenchmark.codahaleMeterMarkBenchmark avgt 5 38.625 ± 2.111 ns/op
i.p.c.b.CounterBenchmark.prometheusCounterChildIncBenchmark avgt 5 9.869 ± 0.252 ns/op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, we can remove these old Java client benchmarks. They've not been relevant in many years.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean remove from README.md, or remove the implementations as well? Are there benchmarks that are still relevant apart from the ExemplarsBenchmark?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementations for the old java client. The benchmarks are still relevant, that client isn't however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed the old client from the benchmarks.

@fstab
Copy link
Member Author

fstab commented May 15, 2021

I removed the obsolete benchmarks, and I bumped the OpenTelemetry agent version as 1.2.0 was released today. What do you think?

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a deeper review, and found a few nits.

request_duration_histogram_bucket{path="/god-of-fire",le="0.004"} 4.0 # {trace_id="043cd631811e373e4180a678c06b128e",span_id="cd122e457d2ca5b0"} 0.0033 1618261159.027
```

Note that this is an example application for a unit test, so durations don't represent real durations, and some example metrics might not make sense in the real world.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/unit test/demonstration/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


## Disabling OpenTelemetry Exemplars

If you use OpenTelemetry tracing but don't want Exemplars, you can disable OpenTelemetries in multiple ways.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not

Contractions are harder to deal with for non-native speakers apparently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

### Disable OpenTelemetry Exemplars at Runtime

If your application uses OpenTelemetry tracing, but you want to disable OpenTelemetry at runtime without changing code,
start your application with the `otelExemplars` system property:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be namespaced a bit more, so that it's tied to this client. As-is I'd presume from the name that it was an OTel internal.

Copy link
Member Author

@fstab fstab May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixed with io.prometheus.

java -DotelExemplars=inactive -jar my-application.jar
```

Alternatively, you can set the environment variable `OTEL_EXEMPLARS=inactive`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend sticking with one way of doing things to avoid confusion. System properties are the way to do this in Java.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the environment variable.

}
OpenTelemetrySpanContextSupplier test = new OpenTelemetrySpanContextSupplier();
test.getSpanId();
test.getSpanId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be getTraceId?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Fixed.

}
OpenTelemetryAgentSpanContextSupplier test = new OpenTelemetryAgentSpanContextSupplier();
test.getSpanId();
test.getSpanId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@fstab
Copy link
Member Author

fstab commented May 17, 2021

Thanks a lot for reviewing again. I pushed another update. What do you think?

@brian-brazil
Copy link
Contributor

👍

@fstab fstab force-pushed the otel-exemplars branch 2 times, most recently from 602a5a0 to 603a9df Compare May 18, 2021 20:53
@fstab fstab force-pushed the otel-exemplars branch from 603a9df to c4f8b5c Compare May 24, 2021 19:48
@fstab fstab changed the base branch from master to next-release May 24, 2021 20:03
@fstab fstab merged commit d216b3f into next-release May 24, 2021
@fstab
Copy link
Member Author

fstab commented May 24, 2021

Thanks a lot for your great review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants