-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus server running linkerd does not scrape all endpoints with EOF error #2067
Comments
An interesting pattern emerged as I tested endpoints. Only prometheus endpoints that are java based are failing (kafka, cassandra,some of our java based internal apps). All other endpoints are OK (nodejs, go). Not quite sure what that points to. Some unhandled difference in the prometheus java client code? |
tcpdump from prometheus-server running linkerd to one of the failed java endpoints:
|
tcpdump from prometheus-server w/o linkerd to same java endpoint:
|
@gamer22026 thanks for all the detail! is it possible for you to provide us with a docker image of something like your java app that reproduces the issue? |
Was an issue with the prometheus java client: Updating to latest prometheus java client version (0.6.0) fixes the issue. |
Great! |
Bug Report
What is the issue?
When running linkerd on a prometheus server, not all endpoints are being scraped
How can it be reproduced?
Unsure
Logs, error output, etc
linkerd check
outputEnvironment
Possible solution
Additional context
The endpoints I am scraping are not running linkerd.
While I am seeing this issue on many endpoints in my existing prometheus server, I took one specific use case to test with. I have a cassandra server that has 2 scrapable endpoints 9100 (node_exporter) and 61623(cassandra metrics).
As you can see, one works, one does not. If I remove the linkerd proxy from the prometheus-server, then all endpoints work as they should. From the logs, the only difference is the
role transfer-encoding and content-length both found, canceling
warning that shows up on the scrape to 61623.The text was updated successfully, but these errors were encountered: