Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Focused, Vendor-agnostic Collector #2460

Closed
bwplotka opened this issue Feb 11, 2021 · 8 comments
Closed

Proposal: Focused, Vendor-agnostic Collector #2460

bwplotka opened this issue Feb 11, 2021 · 8 comments
Labels
discussion-needed Community discussion needed

Comments

@bwplotka
Copy link

bwplotka commented Feb 11, 2021

👋🏽 Hello and thanks for the amazing work so far!

I have a very far fetched proposal, but... it's open source you can always ignore my idea. 🙃 I would be curious if the following idea would not improve project velocity and maintainer's life, while still maintaining Collector amazing goals 💪🏽

Is your feature request related to a problem? Please describe.

The collector is internally designed as a set of pipelines with internal export interfaces. Collector repo hosts various exporter (10) and receiver (8) implementations that users can configure. There is also https://github.com/open-telemetry/opentelemetry-collector-contrib repo that allows bringing your own “less-used” implementation as long as it fits the interface. This however means you need to compose all in your own Go binary.

Recently I joined Collector/Agent meeting where @jpkrohling asked if one exporter is worth being put into core collector, contrib or not. @tigrannajaryan mentioned that there is a problem with adding more implementations. One of the reasons is that confusion on the user side, what APIs to choose. @alolita also mentioned performance and footprint.

Looking at existing project velocity and project focus I see lots of toil and time spent on those plugins and long-lasting consequences and complexities.

Unfortunately, those 3 reasons (user experience, footprint, complexity) are not the only problems of such architecture. Given the experience of maintaining open-source software, we tried that before.. and learned our lessons. One example is Prometheus discovery libraries, another Thanos object storage clients. In the end, plugins are a very limited approach.

Creating a healthy plugin model ecosystem within the single binary is simply hard to achieve. Not only because of the poor Go plugin system but mainly because it brings ambiguity on who is maintaining, supporting, and ensuring the quality/security of the final binary. Not mentioning major dependency issues and the hard to define criteria on deciding which plugin should be in core or not. Tools like collector builder help, but overall such a model reduces project velocity enormously and is limited (only a limited number of implementations can be in core).

The idea of having one thing that exposes all vendor’s APIs is great but given multi plugin architecture constraints, such a project will always fall behind highly specialized binaries for single signal and APIs in both project velocity and performance (which translates to cost).

Such architecture also encourages/requires distributions which are an extra bit of maintenance for each vendor and confusion for the user.

Describe the solution you'd like

What if we could provide a collector version that focuses on developing a vendor-agnostic pipeline that allows aggregations etc from single input APIs (OTLP/OpenMetrics) and separate well-defined output API (e.g like remote-write for metrics). This allows all vendors to adapt a single output API, instead of being pressed to be part of the OpenTelemetry exporter list which cannot grow infinitely, thus the necessity to create distributions and fragment user space further.

Such a simple and vendor-agnostic collector would enormously allow focusing collector development where the value is, which is: pipelines, stability, security, efficiency, and scaling! This would also allow vendors to not create separate distributions, but instead, iterate faster and provide better SaaS quality since they have control over translation proxies they can run on their end. (Note: This is what works very well for Prometheus remote write, where most the vendors adopted remote write as stable metric stream API)

Of course, there is an open question, how do you ensure vendors and clients expose the single APis? If the adoption will be big enough, it will motivate vendors to adjust their SaaS APIs. To help with this we could provide intermitted translation proxy in separate repos, vendors or clients can use in the meantime.

Describe alternatives you've considered
Current Collector with plugins 🙃

@jkowall
Copy link

jkowall commented Feb 11, 2021

The downside of this approach is that it works great for metrics, but as we get into logging (transforms, parsing, and other elements) and tracing (sampling which requires a more complex set of capabilities) then things start getting too complex inside the collector pipelines. If we solve the input and output, there are still a lot of things in between that would need to be cared for. We can add a lot of complex logic and capabilities into an exporter for example which wouldn't be handled by just having a wire protocol like OLTP.

I agree that the problem is going to get worse and we need to figure out a solution, but fixing the input and output seems like it doesn't really solve it. I think over time other vendors will support OLTP, we already do so at my employer. Exporters may be depreciated over time, but as you know things can linger for many years before that happens.

@bwplotka
Copy link
Author

Totally makes sense. The processing part would stay the same indeed, but you are right - it has similar(plugin) bottlenecks and the potential need (again) for distributions etc

However, I see quite a battle right now with tracing/logging/metrics vendors feeling missing out, asking why their API is not part of OpenTelemetry, and forcing the development of distributions and forcing users to make decisions on what distro to use, causing potentially more harm then benefit. Reducing the distribution need to the minimum (only for processors) feels like some improvement.

Essentially the outcome of the current design has far-reaching, political consequences. The current mental thinking is that if you are not part of the OpenTelemetry collector, you are missing out. Then you "adopt it" by creating your fork of it, because it's faster and easier than changing your APIs to OTLP (which is still the majority in beta/alpha) or worse: Try to be part of the core (impossible due to curated list of APIs). Then they write a blog post and done (: I feel that the current design might push the problem to the user, who cannot use core (as their choice of the vendor is not part of core), have to choose from many distros which are not guaranteed to be stable, secure and compatible, etc.

NOTE: We could implement focused collector on a side as... official distribution 🙃 If we want that long term (:

@jpkrohling
Copy link
Member

Recently I joined Collector/Agent meeting where @jpkrohling asked if one exporter is worth being put into core collector, contrib or not

To clarify, the question was whether components exporting data directly to a storage mechanism would belong to the contrib. That said, I'm a big fan of decentralization, and I think the contrib distribution could eventually disappear, giving place for an official registry of components and a supported way to pick and choose which ones to use, like this: https://code.quarkus.io/. @gramidt volunteered to give some thought to this if that's what the community wants, but apparently, that's not where we should be focusing on right now.

What if we could provide a collector version that focuses on developing a vendor-agnostic pipeline that allows aggregations etc from single input APIs (OTLP/OpenMetrics) and separate well-defined output API (e.g like remote-write for metrics). This allows all vendors to adapt a single output API, instead of being pressed to be part of the OpenTelemetry exporter list which cannot grow infinitely, thus the necessity to create distributions and fragment user space further.

The core distribution does not contain any commercial vendor components. Perhaps you are suggesting that Zipkin and Jaeger are vendor-specific components? While I would agree these components could be removed in the long-term, not having them for GA would make the core distribution less useful, especially because OTLP itself isn't battle-tested yet, apart from a few brave souls using it before the official GA.

That said, it is possible to use the builder to generate an even slimmer version of the core, without any metrics components, for instance.

However, I see quite a battle right now with tracing/logging/metrics vendors feeling missing out, asking why their API is not part of OpenTelemetry

Not sure I see this battle, as the rules are simple: the core contains receivers and exporters for formats used by open source projects, like OTLP, Jaeger, and Zipkin on the tracing side, and OTLP, and Prometheus on the metrics side. Receivers and exporters related to commercial vendors belong to the contrib, and every vendor is more than welcome to contribute their components.

We could implement focused collector on a side as... official distribution

I believe this is the purpose of the core distribution. The contrib is also an official distribution, but with a different degree of support.

@bwplotka
Copy link
Author

bwplotka commented Feb 12, 2021

The core distribution does not contain any commercial vendor components. Perhaps you are suggesting that Zipkin and Jaeger are vendor-specific components? While I would agree these components could be removed in the long-term, not having them for GA would make the core distribution less useful, especially because OTLP itself isn't battle-tested yet, apart from a few brave souls using it before the official GA.

Valid point. Vendor agnostic to me means being focused on the API, not on if the target system is Jaeger, AWS or Zipkin.
This way we can iterate and improve much faster.

especially because OTLP itself isn't battle-tested yet, apart from a few brave souls using it before the official GA.

You touched on a very good topic. How do we want to ensure OTLP is battle-tested if we discourage everyone to use it by layering custom receivers/exporters and encouraging distributors/semi fork architecture? 🤔 It's obviously much easier for vendors and SaaS to provide their own distribution that can even not use any core Otel code and still claim badge of OpenTelemetry adopted?

@flands
Copy link
Contributor

flands commented Feb 16, 2021

You touched on a very good topic. How do we want to ensure OTLP is battle-tested if we discourage everyone to use it by layering custom receivers/exporters and encouraging distributors/semi fork architecture? 🤔 It's obviously much easier for vendors and SaaS to provide their own distribution that can even not use any core Otel code and still claim badge of OpenTelemetry adopted?

Well it is the default implementation for instrumentation libraries and the reference architecture on how to send OTel instrumentation data to the Collector :) in addition, with multiple vendors already supporting OTLP, battle-testing will naturally happen.

I am curious what you mean when you say discourage everyone to use it. Unfortunately, the ecosystem is not single-standards based and without providing a path forward for existing, popular solutions you end up with a fragmented market that end-users are then forced to navigate.

The current architecture is similar to envoy plugins and they follow a similar model of deciding when to include these plugins in core vs. contrib: envoyproxy/envoy#14078. I suspect OTel will adopt a similar stance in the future.

@tigrannajaryan
Copy link
Member

@bwplotka thank you for the proposal. It is great to see an experienced person going over the project and making improvement suggestions.

Recently I joined Collector/Agent meeting where @jpkrohling asked if one exporter is worth being put into core collector, contrib or not. @tigrannajaryan mentioned that there is a problem with adding more implementations. One of the reasons is that confusion on the user side, what APIs to choose. 

I think you misunderstood what I said. The discussion was about allowing or not allowing exporters that are targeting data storage and I said that it is perfectly fine, is not against any existing policy and that I do not see any reasons why not to do it, though if such reasons come up in the future we should discuss them.

Looking at existing project velocity and project focus I see lots of toil and time spent on those plugins and long-lasting consequences and complexities.

Note that the primary toil and time spent on contributed components is the responsibility of vendors who contributed the components. It is not the responsibility of the maintainers and due to the way the Collector is architected the components are completely decoupled and are written without touching a single line of code that the maintainers are responsible for.

That said, I do not want to downplay the costs of maintaining large codebases. We should be careful to avoid being overwhelmed with maintenance as the codebase grows.

Given the experience of maintaining open-source software, we tried that before.. and learned our lessons. One example is Prometheus discovery libraries, another Thanos object storage clients. In the end, plugins are a very limited approach.

I appreciate the warning. We (Collector maintainers) should definitely keep this in our mind to avoid making our lives complicated. We should keep our options open and be ready to change our approach if needed.

What if we could provide a collector version that focuses on developing a vendor-agnostic pipeline that allows aggregations etc from single input APIs (OTLP/OpenMetrics) and separate well-defined output API (e.g like remote-write for metrics). This allows all vendors to adapt a single output API, instead of being pressed to be part of the OpenTelemetry exporter list which cannot grow infinitely, thus the necessity to create distributions and fragment user space further.

Collector is developer-agnostic by welcoming all vendors to have their exporters. I think this approach is liked by vendors as it is evidenced by dozens of exporters that implement proprietary protocols. By writing a single Collector exporter in Go they ensure that any application instrumented using any OpenTelemetry language library can send telemetry to their backend and they avoid the need to write multiple exporters, one for each language library.

If we do what you suggest we will exclude a large number of vendors who don’t support OTLP ingest today. I do think OTLP will become the protocol of choice eventually, but adoption takes time.

You touched on a very good topic. How do we want to ensure OTLP is battle-tested if we discourage everyone to use it by layering custom receivers/exporters and encouraging distributors/semi fork architecture? 

We do not discourage the usage of OTLP. If you saw this written anywhere in Otel docs please provide a link so that I can fix it. If you heard a person discouraging it please let me know their name, I would love to speak with them to understand what they think can be improved in OTLP.

To reiterate one more time. We highly encourage usage and adoption of OTLP. OTLP is the default export protocol for OpenTelemetry SDKs. Also, Collector performance tests show that OTLP is the fastest protocol to receive or export.

I totally see how in the future vendors start supporting OTLP and deprecate their proprietary protocol. When that happens I will be very glad to see many custom exporters deleted from Collector as they will no longer be necessary.

Full disclosure: I am the author of OTLP/gRPC and OTLP/HTTP transports, and OTLP Traces and Logs Protobuf and JSON data formats. If anyone has ideas on what to improve in OTLP, I would love to talk.

@alolita
Copy link
Member

alolita commented Mar 9, 2021

@bwplotka Thanks for this proposal! I'd like to understand more details. Will reach out to you with my questions. Hope to see you join the Prometheus workgroup!

@jrcamp jrcamp added the discussion-needed Community discussion needed label Mar 10, 2021
@bogdandrutu
Copy link
Member

Collector main module was reduced to only OTLP support and a limited amount of components. Thanks @bwplotka

hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-needed Community discussion needed
Projects
None yet
Development

No branches or pull requests

8 participants