Skip to content
This repository was archived by the owner on Dec 6, 2024. It is now read-only.

Specify how to propagate consistent head sampling probability #168

Merged
merged 46 commits into from
Sep 29, 2021
Merged
Changes from 20 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
14bd54e
Specify how to propagate head sampling probability
Jul 23, 2021
1d5d60a
edit
Jul 23, 2021
c741f7e
version
Jul 23, 2021
6adbd1a
links to OTEP 148 are TODOs
Jul 23, 2021
11206d7
rename
Jul 27, 2021
4085972
Add a tracestate variation
Jul 27, 2021
5cd3b9a
redraft using tracestate and two values
Jul 28, 2021
5aedc9c
edits
Jul 28, 2021
32544ea
Drop mention of inflationary
Jul 28, 2021
aa22609
detail about samplers
Jul 28, 2021
73f3b6f
edit
Jul 29, 2021
2fbcb30
change the format to otel=k1:v;k2:v; explain geometric distribution
Aug 10, 2021
695025c
followup from feedback and this week's SIG
Aug 20, 2021
fb75d9c
edits
Aug 20, 2021
8f7ad73
Let 2^61 be the min probability; leaves one unused value to represent…
Aug 23, 2021
765bd12
worked example (draft)
Sep 3, 2021
56910bd
corner cases
Sep 8, 2021
e06a7cf
corner case edits
Sep 8, 2021
0804649
corner case edits
Sep 8, 2021
cb068a2
edit
Sep 8, 2021
c9fa24f
from @oertl feedback especially
Sep 8, 2021
1b3ae23
clarify
Sep 8, 2021
d0c2697
Apply suggestions from code review
jmacd Sep 9, 2021
98f6403
rewrite explaination for r-value
Sep 9, 2021
16947f7
more
Sep 9, 2021
d9a4d59
example
Sep 9, 2021
34ec604
selection probability -> probabilty of r
Sep 9, 2021
f94c2d5
Merge branch 'main' of github.com:open-telemetry/oteps into jmacd/tra…
Sep 9, 2021
48123fe
typos
Sep 9, 2021
139f248
another example
Sep 10, 2021
2a37c4c
off-by-ones
Sep 10, 2021
a9c7500
discuss naming
Sep 10, 2021
b11f70e
Apply suggestions from code review
jmacd Sep 10, 2021
2a59cfc
off-by-zero
Sep 13, 2021
bb92360
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 13, 2021
3097dcb
lint
Sep 15, 2021
0acc729
lint
Sep 15, 2021
fa2ded1
Remove log_head_adjusteed_count; remove the +1 bias for p-values; r n…
Sep 21, 2021
d119c57
Use 7/16
Sep 21, 2021
5ea047e
Use 7/16
Sep 21, 2021
28779fe
Use 7/16
Sep 21, 2021
04b37e4
Merge branch 'main' into jmacd/traceprop
jmacd Sep 21, 2021
32c384e
5%
Sep 28, 2021
efc4bb0
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 28, 2021
f6ffd02
mention w3c trace context issue 467 (randomess bit); move issue 463 t…
Sep 28, 2021
0a296b5
whitespace
Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 368 additions & 0 deletions text/trace/0168-sampling-propagation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
# Propagate head trace sampling probability

Use the W3C trace context to convey consistent head trace sampling probability.

## Motivation

The head trace sampling probability is the probability associated with
the start of a trace context that was used to determine whether the
W3C `sampled` flag is set, which determines whether child contexts
will be sampled by a `ParentBased` Sampler. It is useful to know the
head trace sampling probability associated with a context in order to
build span-to-metrics pipelines when the built-in `ParentBased`
Sampler is used. Further motivation for supporting span-to-metrics
pipelines is presented in [OTEP
170](https://github.com/open-telemetry/oteps/pull/170).

A consistent trace sampling decision is one that can be carried out at
any node in a trace, which supports collecting partial traces.
OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
aims to accomplish this goal but was left incomplete (see a
[TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased)
in the v1.0 Trace specification).

We propose to propagate the necessary information alongside the [W3C
sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
`tracestate` with an `otel` vendor tag, which will require
(separately) [specifying how the OpenTelemetry project uses
`tracestate` itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).

## Explanation

Two pieces of information are needed to convey consistent head trace
sampling probability:

1. The head trace sampling probability
2. Source of consistent sampling decisions.

This proposal uses 6 bits of information for each of these and does
not depend on built-in TraceID randomness, which is not sufficiently
specified for probability sampling at this time. This proposal closely
follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).

### Probability value

To limit the cost of this extension and for statistical reasons
documented below, we propose to limit head trace sampling probability
to powers of two. This limits the available head trace sampling
probabilities to 1/2, 1/4, 1/8, and so on. We can compactly encode
these probabilities as small integer values using the base-2 logarithm
of the adjusted count.

Using six bits of information we can convey known and unknown sampling
rates as small as 2**-61. The value 63 is reserved to mean sampling
with probability 0, which conveys an adjusted count of 0 for the
associated context.

When propagated, the probability value will be interpreted as shown in
the folowing table, which uses an offset of +1 in order to place the
Unknown value at 0:

| Probability Value | Head Probability | Note |
| ----- | ----------- | ---- |
| 0 | Unknown | Reserved for span data |
| 1 | 1 | |
| 2 | 1/2 | |
| 3 | 1/4 | |
| ... | ... | |
| N | 2**(-N+1) | 1 in 2**(N-1) |
| ... | ... | |
| 61 | 2**-60 | |
| 62 | 2**-61 | |
| 63 | 0 | Maximum encoded value |

[Described in OTEP
170](https://github.com/open-telemetry/oteps/pull/170), Span data
sampled by the `ParentBased` sampler will encode the value that was
propagated by the parent span as its "probability value" `p`.

The value `p=0` SHOULD NOT be propagated using `tracestate`
explicitly, because the equivalent interpretation can be obtained by
omitting `p`.

### Randomness value

With head trace sampling probabilities limited to powers of two, the
amount of randomness needed per trace context is limited. A
consistent sampling decision is accomplished by propagating a specific
random variable denoted `r`. The random variable is a described by a
geometric distribution having shape parameter `1/2`, listed below:

| `r` Value | Selection Probability |
| ---------------- | --------------------- |
| 0 | 1/2 |
| 1 | 1/4 |
| 2 | 1/8 |
| 3 | 1/16 |
| ... | ... |
| 0 <= `r` <= 61 | 1/(2**(-`r`+1)) |
| ... | ... |
| 60 | 2**-61 |
| 61 | 2**-61 |

Such a random variable `r` can be generated using the following
pseudocode.

```golang
func nextRandomness() int {
r := 0
for r < 61 && nextRandomBit() == false {
r++
}
return R
}
```

This can be computed from a stream of random bits as the number of
leadieng zeros using efficient instructions on modern computer
architectures.

For example, the value 3 means there were three leading zeros and
corresponds with being sampled at probabilities 1-in-1 through 1-in-8
but not at probabilities 1-in-16 and smaller.

### Proposed `tracestate` syntax

The consistent sampling randomness valuw (`r`) and and head sampling
probability value (`p`) will be propagated using two bytes of base16 content
for each of the two fields, as follows,

```
tracestate: otel=p:PP;r:RR
```

where `PP` are two bytes of base16 probability value and `RR` are two
bytes of base16 random value. These values are omitted when they are
unknown.

This proposal should be taken as a recommendation and will be modified
to [match whatever format OpenTelemtry specifies for its
`tracestate`](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
The choice of base16 encoding is therefore just a recommendation,
chosen because `traceparent` uses base16 encoding.

### Examples

The following `tracestate` value:

```
tracestate: otel=r:0a;p:03
```

translates to

```
base16(probability) = 03 // 1-in-8 head probability
base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
```

Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
greater will enable sampling this trace, whereas any
`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
will stop sampling this trace.

## Internal details

The reasoning behind restricting the set of sampling rates is that it:

- Lowers the cost of propagating head sampling probability
- Limits the number of random bits required
- Avoids floating-point to integer rounding errors
- Makes math involving partial traces tractable.

[An algorithm for making statistical inference from partially-sampled
traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
explains how to work with a limited number of power-of-2 sampling rates.

### Behavior of the `TraceIDRatioBased` Sampler

The Sampler MUST be configured with a power-of-two probability
expressed as `2**-s` except for the special case of zero probability.

If the context is a new root, the initial `tracestate` must be created
with randomness value `r`, as described above, in the range [0, 61].
If the context is not a new root, output a new `tracestate` with the
same `r` value as the parent context.

When sampled, in both cases, the context's probability value `p` is
set to the value of `s+1` in the range [1, 63]. If the sampling
probability is zero (the special case where `s` is undefined), use
`p=63` the specified value for zero probability.

In both cases, set the `sampled` bit if the outgoing `p` minus one is
less than the outgoing `r` plus one and `p` is less than 63 (i.e.,
`p-1 < r+1` and `p < 63` implies sampled).

If the context is not a new root and the incoming context's `r` value
is not set, the implementation SHOULD notify the user of an error
condition and follow the incoming context's `sampled` flag.

The span's `log_head_adjusted_count` field is set to the outgoing `p`
unless `r` is unknown, in which case it MUST be set to zero (unknown
probability).

### Behavior of the `ParentBased` sampler

The `ParentBased` sampler is modified by this proposal. It honors
the W3C `sampled` flag and copies the incoming `tracestate` keys to
the child context.

The span's `log_head_adjusted_count` field is set to the incoming
value of `p` when both `p` and `r` are defined. When `r` is not
defined, the span's `log_head_adjusted_count` MUST be set to 0
indicating unknown probability, because the decision cannot be made
consistently across the trace.

### Behavior of the `AlwaysOn` Sampler

The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with
100% sampling probability (i.e., `s=0` yielding `p=1`).

### Behavior of the `AlwaysOff` Sampler

The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with
zero probability (i.e., `p=63`, `s` undefined).

## Worked 3-bit example

The behavior of these tables can be verified by hand using a smaller
example. The following table shows how these equations work where
`r`, `p`, and `s` are limited to 3 bits.

Values of `p`, which have the same encoded value and interpretation as
for the proposed `log_head_adjusted_count` field of OTEP 170, would be
interpreted as follows:

| `p` value | Adjusted count |
| ----- | ----- |
| 0 | Unknown |
| 1 | 1 |
| 2 | 2 |
| 3 | 4 |
| 4 | 8 |
| 5 | 16 |
| 6 | 32 |
| 7 | 0 |

Note there are only 6 non-zero, non-unknown values for the adjusted
count. Thus there are six defined values of `r` and `s`. The
following table shows `r` and the corresponding selection probability,
along with the calculated adjusted count for each `s`:

| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=4` | `s=5` | `s=6` |
| -- | -- | -- | -- | -- | -- | -- | -- |
| 0 | 1/2 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1/4 | 1 | 2 | 0 | 0 | 0 | 0 |
| 2 | 1/8 | 1 | 2 | 4 | 0 | 0 | 0 |
| 3 | 1/16 | 1 | 2 | 4 | 8 | 0 | 0 |
| 4 | 1/32 | 1 | 2 | 4 | 8 | 16 | 0 |
| 5 | 1/32 | 1 | 2 | 4 | 8 | 16 | 32 |

Notice that the sum of `r` selection probability times adjusted count
in each of the `s=*` columns equals 1. For example, in the `s=5`
column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
16/32 + 16/32 = 1`. In the `s=2` column we have `0*1/2 + 0*1/4 +
4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
1/4 + 1/8 + 1/8 = 1`. We conclude that when `r` is chosen with the
given probabilities, any choice of `s` produces one expected span.

## Summary

The following table summarizes how the three Sampler cases behave with
respect to the incoming and outgoing values for `p`, `r`, and
`sampled`:

| Sampler | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r` | Outgoing `p` | Outgoing `sampled` |
| -- | -- | -- | -- | -- | -- | -- |
| Parent | unused | expected | respected | passed through | passed through | passed through |
| TraceIDRatio(Non-Root) | used | unused | ignored | passed through | set to `s+1` | set to `p-1 < r+1 && p < 63` |
| TraceIDRatio(Root) | n/a | n/a | n/a | random variable | set to `s+1` | set to `p-1 < r+1 && p < 63` |

There are several cases where the resulting span's
`log_head_adjusted_count` is unknown:

| Sampler | Unknown condition |
| -- | -- |
| Parent | no incoming `p` |
| TraceIDRatio(Root) | no incoming `r` |
| TraceIDRatio(Non-Root) | none |

There are cases where the combination of `p` and `r` and `sampled`
that cannot be generated by the built-in samplers. The case where
sampled is true with `p=63` indicating 0% probability may be used when
recording spans that were selected by a different sampler while a
probability sampler is also in use. These cases are known as "zero
adjusted count" contexts which are sampled with 0% probability.

The case where sampled is false with `p=1` indicating 100% probability
is an illogical condition. See [Propagating `p` when
unsampled](#propagating-p-when-unsampled) below.

## Prototype

[This proposal has been prototyped in the OTel-Go
SDK.](https://github.com/open-telemetry/opentelemetry-go/pull/2177) No
changes in the OTel-Go Tracing SDK's `Sampler` or `tracestate` APIs
were needed.

## Trade-offs and mitigations

### Not using TraceID randomness

It would be possible, if TraceID were specified to have at least 62
uniform random bits, to compute the randomness value described above
as the number of leading zeros among those 62 random bits.

This proposal requires modifying the W3C traceparent specification,
therefore we do not propose to use bits of the TraceID.

[This issue has been filed with the W3C trace context group.](https://github.com/w3c/trace-context/issues/463)

### Not using TraceID hashing

It would be possible to make a consistent sampling decision by hashing
the TraceID, but we feel such an approach is not sufficient for making
unbiased sampling decisions. It is seen as a relatively difficult
task to define and specify a good enough hashing function, much less
to have it implemented in multiple languages.

Hashing is also computationally expensive. This proposal uses extra
data to avoid the computational cost of hashing TraceIDs.

### Restriction to power-of-two

Restricting head sampling rates to powers of two does not limit tail
Samplers from using arbitrary probabilities. The companion [OTEP
170](https://github.com/open-telemetry/oteps/pull/170) has discussed
the use of a `sampler.adjusted_count` attribute that would not be
limited to power-of-two values. Discussion about how to represent the
effective adjusted count for tail-sampled Spans belongs in [OTEP
170](https://github.com/open-telemetry/oteps/pull/170), not this OTEP.

Restricting head sampling rates to powers of two does not limit
Samplers from using arbitrary effective probabilities over a period of
time. For example, choosing 1/2 sampling half of the time and 1/4
sampling half of the time leads to an effective sampling rate of 3/8.

### Propagating `p` when unsampled

Consistent trace sampling requires the `r` value to be propagated even
when the span itself is not sampled. It is not necessary, however, to
propagate the `p` value when the context is not sampled, since
`ParentBased` samplers will not change the decision. Although one
use-case was docmented in Google's early Dapper system (known as
"inflationary sampling", see
https://github.com/open-telemetry/oteps/pull/170), the same effect can
be achieved using a consistent sampling decision in this framework.

### Default behavior

In order for consistent trace sampling decisions to be made, the `r`
value MUST be set at the root of the trace. This behavior could be
opt-in or opt-out. If opt-in, users would have to enable the setting
of `r` and the setting and propagating of `p` in the tracestate. If
opt-out, users would have to disable these features to turn them off.
The cost and convenience of Sampling features depend on this choice.

This author's recommendation is that these behaviors be opt-out, i.e.,
on-by-default. This decision should not block this OTEP.