Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per-worker listener and watchdog stats #8263

Merged
merged 11 commits into from
Sep 20, 2019
Merged

per-worker listener and watchdog stats #8263

merged 11 commits into from
Sep 20, 2019

Conversation

mattklein123
Copy link
Member

This PR does a few things:

  1. Adds per-worker listener stats, useful for viewing worker
    connection imbalance.
  2. Adds per-worker watchdog miss stats, useful for viewing per
    worker event loop latency.
  3. Misc connection handling cleanups.

Part of #4602

Risk Level: Medium
Testing: Modified UTs and new integration test
Docs Changes: Added
Release Notes: Added

This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of #4602

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Member Author

@danzh2010 @mergeconflict @jmarantz PTAL

Signed-off-by: Matt Klein <mklein@lyft.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8263 was synchronize by mattklein123.

see: more, trace.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick skim of this and I have some high level questions mostly to educate me...

What is the rough cardinality (order of magnitude) of:

  • watched dogs (one per thread?)
  • active listeners of various flavors
  • connection handlers

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

I will dive in more today.

@mattklein123
Copy link
Member Author

watched dogs (one per thread?)

1 per worker

active listeners of various flavors

1 per listener (so per-worker stats is listeners X workers)

connection handlers

1 per worker and 1 on the main thread

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

Yeah I agree we could do better here, but given the low cardinality I'm not sure it matters. I think we could make these stats configurable if we think this is going to be an issue though, but I found these stats to be incredibly useful in debugging potential worker imbalance issues so they seem generally useful to me (but I wanted to get @mergeconflict @oschaaf opinion on that)

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Copy link
Member

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me; my only nits are about stat names.

@mattklein123
Copy link
Member Author

/azp run envoy-linux

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Member Author

@danzh2010 @jmarantz @mergeconflict master merged and comments addressed. PTAL.

mergeconflict
mergeconflict previously approved these changes Sep 19, 2019
Copy link
Member

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

jmarantz
jmarantz previously approved these changes Sep 20, 2019
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm generally modulo the test nit.

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010
danzh2010 previously approved these changes Sep 20, 2019
Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a few nits!

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123 mattklein123 dismissed stale reviews from danzh2010 and jmarantz via e1a0c99 September 20, 2019 16:36
@mattklein123
Copy link
Member Author

@danzh2010 updated per comments.

@mattklein123 mattklein123 merged commit 483aa09 into master Sep 20, 2019
@mattklein123 mattklein123 deleted the per_worker_stats branch September 20, 2019 20:05
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Sep 24, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants