-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zipkin Receiver: default timestamp #1068
Changes from all commits
edc1c82
af7a2d5
0da8861
c3f6252
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ import ( | |
"fmt" | ||
"math" | ||
"strconv" | ||
"time" | ||
|
||
commonpb "github.com/census-instrumentation/opencensus-proto/gen-go/agent/common/v1" | ||
tracepb "github.com/census-instrumentation/opencensus-proto/gen-go/trace/v1" | ||
|
@@ -27,6 +28,7 @@ import ( | |
"github.com/pkg/errors" | ||
|
||
"go.opentelemetry.io/collector/consumer/consumerdata" | ||
"go.opentelemetry.io/collector/internal" | ||
tracetranslator "go.opentelemetry.io/collector/translator/trace" | ||
) | ||
|
||
|
@@ -175,6 +177,7 @@ func zipkinV1ToOCSpan(zSpan *zipkinV1Span) (*tracepb.Span, *annotationParseResul | |
} | ||
|
||
setSpanKind(ocSpan, parsedAnnotations.Kind, parsedAnnotations.ExtendedKind) | ||
SetTimestampsIfUnset(ocSpan) | ||
|
||
return ocSpan, parsedAnnotations, nil | ||
} | ||
|
@@ -481,3 +484,25 @@ func (ep *endpoint) createAttributeMap() map[string]string { | |
} | ||
return attributeMap | ||
} | ||
|
||
func SetTimestampsIfUnset(span *tracepb.Span) { | ||
// zipkin allows timestamp to be unset, but opentelemetry-collector expects it to have a value. | ||
// If this is unset, the conversion from open census to the internal trace format breaks | ||
// what should be an identity transformation oc -> internal -> oc | ||
if span.StartTime == nil { | ||
now := internal.TimeToTimestamp(time.Now()) | ||
span.StartTime = now | ||
span.EndTime = now | ||
|
||
if span.Attributes == nil { | ||
span.Attributes = &tracepb.Span_Attributes{} | ||
} | ||
if span.Attributes.AttributeMap == nil { | ||
span.Attributes.AttributeMap = make(map[string]*tracepb.AttributeValue, 1) | ||
} | ||
span.Attributes.AttributeMap[StartTimeAbsent] = &tracepb.AttributeValue{ | ||
Value: &tracepb.AttributeValue_BoolValue{ | ||
BoolValue: true, | ||
}} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure about this but I don't know what the right thing to do is. Addin/modifying data like this just to make serialize happy doesn't sound that great to me. This mean backends will show misleading information to users. Alternatively, we could just drop such spans as a validation step in the zipkin receiver and log messages but we can't do that as zipkin spec allows for the possibility of not having timestamps. Is there a way we can serialize it differently without adding fake timestamps? If there is no other viable solution, we should at least add an attribute to the span that represents that the timestamps are fake and shouldn't be trusted. Also, can we may be use more obvious fake value for timestamp like unix epoch or something? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zipkin spec on this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If this was the only edge-case, we could just drop the span as it didn't actually represent a real operation but we have two more scenarios that can result in spans without timestamps so we can't just drop. It looks like setting a static timestamp that is obviously fake along with a well-known attribute that says the timestamps were injected by Otel collector as a workaround is probably our best bet. Does Jaeger proto allow having empty timestamps? If so, an OTLP>Jaeger/Zipkin translator should probably set the timestamp to nil again when it encounters this tag on a span. One concern with setting a very old timestamp is that it might affect some backends in an unexpected way. For example, some backend might drop the span very soon if it stores only recent data and by span operation timestamps instead of ingestion timestamp. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not familiar with Jaeger, will need someone else to weigh in on that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. We can set a fake current timestamp along with an attribute identifying this workaround and use it in Jaeger/zipkin exporters to reconstruct null timestamps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds like a zipkin convention. We should probably have an otel convention about such tags. May be
Why current timestamp? Can we do zero time? Like 0 seconds since unix epoch? I'm afraid if we use current time and the span get exported using something other than zipkin, it might be very misleading to users when seeing the span in a tracing backend. "zero time" will at least raise some eye brows. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @owais , 0 timestamps look cleaner, but if I understood your concern right it doesn't look like the best way:
IMO still dropping spans with 0 timestamp looks like the best way to handle it in otel-collector. @chris-smith-zocdoc could you please let us know if the spans with no timestamps are actually visible in the backend that you use or they needed for aggregation purposes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I disagree. Throwing away data seems suboptimal. Through coincidence they actually end up as a zero timestamp right now, but after the translation from OC -> internal -> OC for the zipkin export, the timestamp is becoming invalid and causing the serialization issue.
Yes they're visible. We have two backends, one is zipkin the other is honeycomb. The data in these spans are just events and annotations, sent after the original span. This appears to be a common behavior in the zipkin libraries we're using for javascript and scala.
In zipkin the data is merged into the original span. In honeycomb these end up as a second event in their system. I've considered building a buffer/merge processor (similar to tail sampling) to do the merge in OTel before sending it to them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree that throwing away the spans does not sound correct for people who might be using zipkin end to end (or even some parts of it) since it is a valid zipkin use case. I still have concerns about zero'ed timestamps causing backends to drop spans too soon. May be backends can use the tag we inject to detect this edge-case but not sure we can expect all backends to do this especially since it is not part of the otel spec. I guess adding current timestamp for both start and end is the safest choice here even though it's not very optimal for backends other than zipkin. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the serialization failure to a permanent error because it had caused an infinite loop in the retry processor otherwise. Is there any reason we'd want to retry on a serialization error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any reason to retry them