separate score generation from sampling and other review comments

Liudmila · Liudmila · commit 234ddf0549c8 · 2020-09-02T13:00:05.000-07:00
diff --git a/text/trace/0107-sampling-score.md b/text/trace/0107-sampling-score.md
@@ -5,33 +5,38 @@ sampling rates and probability calculation algorithms.
 
 ## TL;DR
 
-**Score** is a floating point number associated with
-the trace. It's calculated when trace starts and flows in the `tracestate`,
-it's used by samplers to make consistent sampling decisions.
+**Score** is a floating point number associated with the trace.
+It's calculated when trace starts and flows in the `tracestate`.
+
+*Score* is independent of sampling *probability* (aka *rate*) which represents
+sampler's configuration, not specific to trace.
+
+Sampler can compare the *score* with the configured *probability* to make
+sampling decisions.
 
 Service that starts the trace calculates the score and adds it to the
 `tracestate` so downstream services can re-use it to make their sampling
 decisions *instead of* re-calculating score as a function of trace-id
-(or trace-flags).
-
-*Score* is not related to sampling *rate* (aka *probability* which represents
-sampler's configuration not specific to trace).
+(or trace-flags). This allows to configure sampling algorithm on the first
+service ans avoid coordination of algorithms when multiple tracing tools are
+involved.
 
 ## Motivation
 
 The goal is to enable a mechanism for consistent (best effort) sampling
 between services with different sampling rates and different probability
 calculation algorithms (for interoperability with existing tracing tools).
 
-Consistent sampling decision made in each app of a distributed trace is
-important for better user experience of trace analysis. Consistency is achieved
-by following means:
+Today consistency across multiple services is achieved by following means:
 
-1. Same hashing algorithms used across all apps in a trace.
-   Coordination of sampling algorithms across multiple apps not always possible:
-   for example existing components in a system use vendor-specific
-   tracing tool (pre-OpenTelemetry and update is hard to justify) while there
-   is a desire to use OpenTelemetry for new components.
+1. Same hashing algorithms on trace-id applied on each span.
+   Problems:
+   - **same sampling algorithm must be used across multiple apps**: it is
+   not always possible e.g. when existing components in a system use
+   vendor-specific tracing tool (pre-OpenTelemetry and major upgrade is hard to
+   justify) while new components are instrumented with OpenTelemetry.
+   - **trace-id uniform distribution is not guaranteed** therefore sampling
+   decisions could be biased
 
 2. Sampling flag propagated from the head component/app is used by downstream
    apps to sample in a given trace.
@@ -40,16 +45,22 @@ by following means:
 
 ## Explanation
 
-Sampling propabaility is generated by the first service to make sampling
-decision. It's a random float number in [0, 1] range.
+Sampling score is generated by the first service to make sampling
+decision. It's a random float (6-9 digits precision) number in [0, 1] range.
 Score is stamped on the span and also propagated further within `tracestate`.
 
 Next service reads score from `tracestate` (instead of calculating it from
 trace-id) and compares it with its sampling rate to make sampling decision.
 
-Score is also exposed through span attributes. Vendors can leverage it
+Score is exposed through span attributes. Vendors can leverage it
 to sort traces based on their completeness: the lower the value of score is,
-the higher the chance it was sampled it by each component.
+the higher the chance it was sampled in by each component.
+
+Vendors can enable interoperability (in terms of sampling) between legacy
+tools and OpenTelemetry: legacy libraries can be updated in non-breaking way to
+support external score sampling. Updating current vendor-specific library
+version on the existing service in a backward-compatible way is much easier
+than upgrading to OpenTelemetry.
 
 ### Example
 
@@ -91,7 +102,8 @@ Vendors can pick the most complete traces sorting them by score.
 - Service that starts a trace makes sampling decision.  It's configured to use
 `ExternalScoreSampler`(name TBD) is configured by user. Within `ShouldSample`
 callback sampler
-  - generates random float score (6-9 digits) in [0, 1] interval
+  - generates score [0, 1] interval using `SamplingScoreGenerator` that can run
+    random or deterministic `hash(trace-id)` algorithm.
   - makes sampling decision by comparing generated score to configured rate
   - if decision is `RECORD` (or `RECORD_AND_SAMPLED`), sampler adds
     `sampling.score` attribute to attributes collection of to-be-created span
@@ -105,17 +117,25 @@ callback sampler
     sampling rate
   - if span will be recorded: sampler adds `sampling.score` attribute to
     attributes collection of to-be-created span
+- If downstream service does not find a score in the tracestate, it falls back
+  to the configured score generation algorithm and updates tracestate and
+  attributes
 
 Here is a [proof of concept](https://github.com/lmolkova/opentelemetry-dotnet/pull/1)
 in .NET.
 
 ### Specification Delta
 
-1. Add `SamplingResult.Tracestate` field: sampler should be able to assign a
-   new tracestate for to-be-created span
-2. Add convention for `sampling.score` attribute on span (TBC). Check out
+1. Add `SamplingResult.Tracestate` field: sampler should be able to [assign a
+   new tracestate for to-be-created span](https://github.com/open-telemetry/opentelemetry-specification/issues/856)
+2. Add convention for `sampling.score` attribute on span (TBD). Check out
    [open questions](open-questions) regarding attribute vs special field.
-3. Add `ExternalScoreSampler` implementation of `Sampler`
+3. Add notion of `SamplingScoreGenerator` that has `TraceIdRatioGenerator`,
+   `RandomGenerator`, etc implementations.
+   - Change `TraceIdRatioBased` sampler to use corresponding generator and serve
+   as generic probability sampler with configurable score generation approach.
+4. Add `ExternalScoreSampler` implementation of `Sampler`. It's created with
+   probability value and implementation of `SamplingScoreGenerator`.
 
 ### Trade-offs and mitigations
 
@@ -147,37 +167,15 @@ as an implementation-specific hint for sampler to prioritize recording a span.
 [OpenTelemetry collector](https://github.com/open-telemetry/opentelemetry-collector/blob/60b03d0d2d503351501291b30836d2126487a741/processor/samplingprocessor/probabilisticsamplerprocessor/testdata/config.yaml#L10)
 uses `sampling.priority` to hint collector's sampler decision
 
-To avoid conflicts with existing implementations we fo not reuse priority term.
+To avoid conflicts with existing implementations we do not reuse priority term.
 
 ## Open questions
 
-### Score calculation: can we use ProbabilitySampler?
-
-This spec suggests to generate score randomly to achieve uniform
-distribution.
-
-Assuming trace-ids are uniformly distributed, `ProbabilitySampler` can generate
-score, so the flow could look like this:
-
-`ExternalScoreSampler.ShouldSample`:
-
-- checks if `sampling.score` is available in the tracestate
-- if it's not there, invokes `ProbabilitySampler`, which calculates score
-  and populates it on the attributes
-- updates tracestate
-
-#### Pros
-
-Fallback to `ProbabilitySampler` improves the case when `tracestate` is trimmed
-so there is a chance sampling could be consistent if same probability
-calculation algorithm was used.
-
-#### Cons
+### Should we separate sampling from score generation?
 
-- There is no requirement for trace-ids to be uniformly distributed
-- No clear boundary between `ProbabilitySampler` and `ExternalScoreSampler`.
-`ProbabilitySampler` needs to set score in attribute even if there is no
-`ExternalScoreSampler`.
+Rate-based sampling in this spec is separated from score generation. Sampler can
+be configured to use any algorithm on sampling parameters. Different samplers
+may reuse generation algorithms.
 
 ### Attribute vs field on the span to-be-created
 
@@ -189,8 +187,8 @@ Creating a new float field on `SamplingDecision` could be an alternative.
 It'd also require adding similar property on Span/SpanData.
 
 There are other scenarios when sampling information is useful for
-exporter: e.g. sampling rate (or inverse value: count of spans
-this span represent). Exporters can use it to estimate metrics.
+exporter: e.g. sampling rate (or it's inverse value: count of spans
+this span represents), exporters can use it to estimate metrics.
 
 Populating all sampling information on all spans may be inefficient in terms of
 event payload size and storage while being useful for a subset of vendors.