open-telemetry · jmacd · Sep 29, 2021 · Jul 23, 2021 · Jul 23, 2021 · Jul 23, 2021
diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
@@ -0,0 +1,368 @@
+# Propagate head trace sampling probability
+
+Use the W3C trace context to convey consistent head trace sampling probability.
+
+## Motivation
+
+The head trace sampling probability is the probability associated with
+the start of a trace context that was used to determine whether the
+W3C `sampled` flag is set, which determines whether child contexts
+will be sampled by a `ParentBased` Sampler.  It is useful to know the
+head trace sampling probability associated with a context in order to
+build span-to-metrics pipelines when the built-in `ParentBased`
+Sampler is used.  Further motivation for supporting span-to-metrics
+pipelines is presented in [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170).
+
+A consistent trace sampling decision is one that can be carried out at
+any node in a trace, which supports collecting partial traces.
+OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
+aims to accomplish this goal but was left incomplete (see a
+[TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) 
+in the v1.0 Trace specification).
+
+We propose to propagate the necessary information alongside the [W3C
+sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
+`tracestate` with an `otel` vendor tag, which will require
+(separately) [specifying how the OpenTelemetry project uses
+`tracestate` itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
+
+## Explanation
+
+Two pieces of information are needed to convey consistent head trace
+sampling probability:
+
+1. The head trace sampling probability
+2. Source of consistent sampling decisions.
+
+This proposal uses 6 bits of information for each of these and does
+not depend on built-in TraceID randomness, which is not sufficiently
+specified for probability sampling at this time.  This proposal closely 
+follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).
+
+### Probability value
+
+To limit the cost of this extension and for statistical reasons
+documented below, we propose to limit head trace sampling probability
+to powers of two.  This limits the available head trace sampling
+probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
+these probabilities as small integer values using the base-2 logarithm
+of the adjusted count.
+
+Using six bits of information we can convey known and unknown sampling
+rates as small as 2**-61.  The value 63 is reserved to mean sampling
+with probability 0, which conveys an adjusted count of 0 for the
+associated context.
+
+When propagated, the probability value will be interpreted as shown in
+the folowing table, which uses an offset of +1 in order to place the
+Unknown value at 0:
+
+| Probability Value | Head Probability | Note                   |
+| -----             | -----------      | ----                   |
+| 0                 | Unknown          | Reserved for span data |
+| 1                 | 1                |                        |
+| 2                 | 1/2              |                        |
+| 3                 | 1/4              |                        |
+| ...               | ...              |                        |
+| N                 | 2**(-N+1)        | 1 in 2**(N-1)          |
+| ...               | ...              |                        |
+| 61                | 2**-60           |                        |
+| 62                | 2**-61           |                        |
+| 63                | 0                | Maximum encoded value  |
+
+[Described in OTEP
+170](https://github.com/open-telemetry/oteps/pull/170), Span data
+sampled by the `ParentBased` sampler will encode the value that was
+propagated by the parent span as its "probability value" `p`.
+
+The value `p=0` SHOULD NOT be propagated using `tracestate`
+explicitly, because the equivalent interpretation can be obtained by
+omitting `p`.
+
+### Randomness value
+
+With head trace sampling probabilities limited to powers of two, the
+amount of randomness needed per trace context is limited.  A
+consistent sampling decision is accomplished by propagating a specific
+random variable denoted `r`.  The random variable is a described by a
+geometric distribution having shape parameter `1/2`, listed below:
+
+| `r` Value | Selection Probability |
+| ---------------- | --------------------- |
+| 0 | 1/2 |
+| 1 | 1/4 |
+| 2 | 1/8 |
+| 3 | 1/16 |
+| ... | ... |
+| 0 <= `r` <= 61 | 1/(2**(-`r`+1)) |
+| ... | ... |
+| 60 | 2**-61 |
+| 61 | 2**-61 |
+
+Such a random variable `r` can be generated using the following
+pseudocode.
+
+```golang
+func nextRandomness() int {
+  r := 0
+  for r < 61 && nextRandomBit() == false {
+    r++
+  }
+  return R
+}
+```
+
+This can be computed from a stream of random bits as the number of
+leadieng zeros using efficient instructions on modern computer
+architectures.
+
+For example, the value 3 means there were three leading zeros and
+corresponds with being sampled at probabilities 1-in-1 through 1-in-8
+but not at probabilities 1-in-16 and smaller.
+
+### Proposed `tracestate` syntax
+
+The consistent sampling randomness valuw (`r`) and and head sampling
+probability value (`p`) will be propagated using two bytes of base16 content
+for each of the two fields, as follows,
+
+```
+tracestate: otel=p:PP;r:RR
+```
+
+where `PP` are two bytes of base16 probability value and `RR` are two
+bytes of base16 random value.  These values are omitted when they are
+unknown.
+
+This proposal should be taken as a recommendation and will be modified
+to [match whatever format OpenTelemtry specifies for its
+`tracestate`](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
+The choice of base16 encoding is therefore just a recommendation,
+chosen because `traceparent` uses base16 encoding.
+
+### Examples
+
+The following `tracestate` value:
+
+```
+tracestate: otel=r:0a;p:03
+```
+
+translates to
+
+```
+base16(probability) = 03 // 1-in-8 head probability
+base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
+```
+
+Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
+greater will enable sampling this trace, whereas any
+`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
+will stop sampling this trace.
+
+## Internal details
+
+The reasoning behind restricting the set of sampling rates is that it:
+
+- Lowers the cost of propagating head sampling probability
+- Limits the number of random bits required
+- Avoids floating-point to integer rounding errors
+- Makes math involving partial traces tractable.
+
+[An algorithm for making statistical inference from partially-sampled
+traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
+explains how to work with a limited number of power-of-2 sampling rates.
+
+### Behavior of the `TraceIDRatioBased` Sampler
+
+The Sampler MUST be configured with a power-of-two probability
+expressed as `2**-s` except for the special case of zero probability.
+
+If the context is a new root, the initial `tracestate` must be created
+with randomness value `r`, as described above, in the range [0, 61].
+If the context is not a new root, output a new `tracestate` with the
+same `r` value as the parent context.
+
+When sampled, in both cases, the context's probability value `p` is
+set to the value of `s+1` in the range [1, 63].  If the sampling
+probability is zero (the special case where `s` is undefined), use
+`p=63` the specified value for zero probability.
+
+In both cases, set the `sampled` bit if the outgoing `p` minus one is
+less than the outgoing `r` plus one and `p` is less than 63 (i.e.,
+`p-1 < r+1` and `p < 63` implies sampled).
+
+If the context is not a new root and the incoming context's `r` value
+is not set, the implementation SHOULD notify the user of an error
+condition and follow the incoming context's `sampled` flag.
+
+The span's `log_head_adjusted_count` field is set to the outgoing `p`
+unless `r` is unknown, in which case it MUST be set to zero (unknown
+probability).
+
+### Behavior of the `ParentBased` sampler
+
+The `ParentBased` sampler is modified by this proposal.  It honors
+the W3C `sampled` flag and copies the incoming `tracestate` keys to
+the child context.
+
+The span's `log_head_adjusted_count` field is set to the incoming
+value of `p` when both `p` and `r` are defined.  When `r` is not
+defined, the span's `log_head_adjusted_count` MUST be set to 0
+indicating unknown probability, because the decision cannot be made
+consistently across the trace.
+
+### Behavior of the `AlwaysOn` Sampler
+
+The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with
+100% sampling probability (i.e., `s=0` yielding `p=1`).
+
+### Behavior of the `AlwaysOff` Sampler
+
+The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with
+zero probability (i.e., `p=63`, `s` undefined).
+
+## Worked 3-bit example
+
+The behavior of these tables can be verified by hand using a smaller
+example.  The following table shows how these equations work where
+`r`, `p`, and `s` are limited to 3 bits.
+
+Values of `p`, which have the same encoded value and interpretation as
+for the proposed `log_head_adjusted_count` field of OTEP 170, would be
+interpreted as follows:
+
+| `p` value | Adjusted count |
+| -----     | -----          |
+| 0         | Unknown        |
+| 1         | 1              |
+| 2         | 2              |
+| 3         | 4              |
+| 4         | 8              |
+| 5         | 16             |
+| 6         | 32             |
+| 7         | 0              |
+
+Note there are only 6 non-zero, non-unknown values for the adjusted
+count. Thus there are six defined values of `r` and `s`.  The
+following table shows `r` and the corresponding selection probability,
+along with the calculated adjusted count for each `s`:
+
+| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=4` | `s=5` | `s=6` |
+| --        | --                        | --    | --    | --    | --    | --    | --    |
+| 0         | 1/2                       | 1     | 0     | 0     | 0     | 0     | 0     |
+| 1         | 1/4                       | 1     | 2     | 0     | 0     | 0     | 0     |
+| 2         | 1/8                       | 1     | 2     | 4     | 0     | 0     | 0     |
+| 3         | 1/16                      | 1     | 2     | 4     | 8     | 0     | 0     |
+| 4         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 0     |
+| 5         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 32    |
+
+Notice that the sum of `r` selection probability times adjusted count
+in each of the `s=*` columns equals 1.  For example, in the `s=5`
+column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
+16/32 + 16/32 = 1`.  In the `s=2` column we have `0*1/2 + 0*1/4 +
+4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
+1/4 + 1/8 + 1/8 = 1`.  We conclude that when `r` is chosen with the
+given probabilities, any choice of `s` produces one expected span.
+
+## Summary
+
+The following table summarizes how the three Sampler cases behave with
+respect to the incoming and outgoing values for `p`, `r`, and
+`sampled`:
+
+| Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`    | Outgoing `p`   | Outgoing `sampled` |
+| --                     | --           | --           | --                 | --              | --             | --                 |
+| Parent                 | unused       | expected     | respected          | passed through  | passed through | passed through     |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | passed through  | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
+
+There are several cases where the resulting span's
+`log_head_adjusted_count` is unknown:
+
+| Sampler                | Unknown condition |
+| --                     | --                |
+| Parent                 | no incoming `p`   |
+| TraceIDRatio(Root)     | no incoming `r`   |
+| TraceIDRatio(Non-Root) | none              |
+
+There are cases where the combination of `p` and `r` and `sampled`
+that cannot be generated by the built-in samplers.  The case where
+sampled is true with `p=63` indicating 0% probability may be used when
+recording spans that were selected by a different sampler while a
+probability sampler is also in use.  These cases are known as "zero
+adjusted count" contexts which are sampled with 0% probability.
+
+The case where sampled is false with `p=1` indicating 100% probability
+is an illogical condition.  See [Propagating `p` when
+unsampled](#propagating-p-when-unsampled) below.
+
+## Prototype
+
+[This proposal has been prototyped in the OTel-Go
+SDK.](https://github.com/open-telemetry/opentelemetry-go/pull/2177) No
+changes in the OTel-Go Tracing SDK's `Sampler` or `tracestate` APIs
+were needed.
+
+## Trade-offs and mitigations
+
+### Not using TraceID randomness
+
+It would be possible, if TraceID were specified to have at least 62
+uniform random bits, to compute the randomness value described above
+as the number of leading zeros among those 62 random bits.
+
+This proposal requires modifying the W3C traceparent specification,
+therefore we do not propose to use bits of the TraceID.
+
+[This issue has been filed with the W3C trace context group.](https://github.com/w3c/trace-context/issues/463)
+
+### Not using TraceID hashing
+
+It would be possible to make a consistent sampling decision by hashing
+the TraceID, but we feel such an approach is not sufficient for making
+unbiased sampling decisions.  It is seen as a relatively difficult
+task to define and specify a good enough hashing function, much less
+to have it implemented in multiple languages.
+
+Hashing is also computationally expensive. This proposal uses extra
+data to avoid the computational cost of hashing TraceIDs.
+
+### Restriction to power-of-two 
+
+Restricting head sampling rates to powers of two does not limit tail
+Samplers from using arbitrary probabilities.  The companion [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170) has discussed
+the use of a `sampler.adjusted_count` attribute that would not be
+limited to power-of-two values.  Discussion about how to represent the
+effective adjusted count for tail-sampled Spans belongs in [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170), not this OTEP.
+
+Restricting head sampling rates to powers of two does not limit
+Samplers from using arbitrary effective probabilities over a period of
+time.  For example, choosing 1/2 sampling half of the time and 1/4
+sampling half of the time leads to an effective sampling rate of 3/8.
+
+### Propagating `p` when unsampled
+
+Consistent trace sampling requires the `r` value to be propagated even
+when the span itself is not sampled.  It is not necessary, however, to
+propagate the `p` value when the context is not sampled, since
+`ParentBased` samplers will not change the decision.  Although one
+use-case was docmented in Google's early Dapper system (known as
+"inflationary sampling", see
+https://github.com/open-telemetry/oteps/pull/170), the same effect can
+be achieved using a consistent sampling decision in this framework.
+
+### Default behavior
+
+In order for consistent trace sampling decisions to be made, the `r`
+value MUST be set at the root of the trace.  This behavior could be
+opt-in or opt-out.  If opt-in, users would have to enable the setting
+of `r` and the setting and propagating of `p` in the tracestate.  If
+opt-out, users would have to disable these features to turn them off.
+The cost and convenience of Sampling features depend on this choice.
+
+This author's recommendation is that these behaviors be opt-out, i.e.,
+on-by-default.  This decision should not block this OTEP.