-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASP.NET Core metrics #47536
Comments
Thank you for submitting this for API review. This will be reviewed by @dotnet/aspnet-api-review at the next meeting of the ASP.NET Core API Review group. Please ensure you take a look at the API review process documentation and ensure that:
|
@Tratcher @davidfowl @noahfalk @tarekgh @BrennanConroy @samsp-msft Metrics counters in ASP.NET Core. Covers hosting, Kestrel and SignalR. These are almost all the places counters are used today (there are also some counters in ConcurrencyLimiter, but it's probably being replaced by rate limiting). |
It is nice to see that we'd have ALOT less counters! How does one look at tls handshake failures? Are we going to do anything to put exception information into metric dimensions? |
Good question. I forgot Exception middleware can add an |
@JamesNK regarding the description "Measures the number of concurrent HTTP requests that are currently in-flight.", the words "measure/count/record" have different meaning in the metrics domain, this document gives some context https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection. |
You may see a performance penalty if the conditionally present attributes cause you to frequently alternate what tags are being provided to the same instrument. The attribute list is defined in the OpenTelemetry spec as an unordered list which requires them to be sorted into a canonical ordering before they can be aggregated. I don't recall if the OTel aggregator did something similar but the aggregator for MetricsEventSource (powering dotnet-counters/dotnet-monitor) has a fast path if the number and order of attributes exactly matches what was given last time. Adding or removing an attribute dynamically makes you miss that fast path and triggers a re-sort. I don't know if the cost rises to the level you will care about it but just something to be aware of if you are benchmarking. Is there a reason that some places have endpoint as a single tag and others have host+port as two separate tags? (I'm still poking around, but time for bed, I'll keep looking at it more tomorrow) |
An HTTP request always has a host and port. The HTTP request counters have them as separate tags. Meanwhile, a connection has a local endpoint. That could be a Now that you've brought it up, if we go ahead with |
Makes sense. I don't think any of this stuff below changes your plan - its just me thinking aloud because this is our first foray into using Meters and I'm trying to give it due diligence. I started pondering a little more what this may do to the back-compat story. For tools that explicitly use EventCounters or Meters there is no issue because the APIs for both work fine SxS. For dotnet-counters I tried to avoid having users specify it explicitly by listening to both EventCounters and Meters using the same name. If both respond back (as they would for Microsoft.AspNetCore.Hosting) then the tool is written to automatically prefer using the Meter data and ignoring the EventCounter data. This means users will see a change in behavior by default and I think that is probably OK, but we can give users a way to explicitly opt for the back-compatible EventCounter data if they want it. I expect dotnet-monitor may need something similar. dotnet/diagnostics#3805 We'll also need to figure out how we document our counters now that we are using both APIs. Ideally I'm hoping not to unnecessarily force users to be aware of which API was used to generate metrics data but I think in some cases it is going to be unavoidable. For example I'm hoping we can avoid having different top-level pages for well-known EventCounters vs. well-known Meters but we will probably need to annotate each provider so that users can get that info if they need it. For the histogram counters the MetricsEventSource currently has pretty low default limits on the number of histograms it will track because each one is quite memory hungry. In the short term people may hit those limits quickly in dotnet-counters/dotnet-monitor. For example the intersection of route+status_code attributes I could easily imagine producing 1000s of combinations in non-trivial apps. We can add options to either reduce memory usage and raise the default limits or to let users more precisely target which dimension values are interesting to them. I don't know whether that work would happen in .NET 8 or not but I think it makes more sense to assume this problem will be alleviated sooner or later rather than to restrict your design based on a point in time constraint. The metrics themselves seemed pretty reasonable to me. I noticed there might be some small gaps relative to what existed with EventCounters, for example no 'Total Connections Timed Out', but I don't think exact parity is a requirement or goal here. The duration histogram does let people get at how many connections timed-out during each measurement interval and I assume that is the value they would care about far more than the running total since the process started. Worst case adding new metrics in the future based on customer feedback shouldn't be hard and it is better than cluttering the list with things most users won't care about. |
I didn't add explicit counters in situations where the number can be figured out from another counter. For example "Total Connections Timed Out" can be calculated by using the number of items recorded by the connection duration counter that have a
By the way, |
The dashes thing is such a pain. I’m |
+1 |
I don't think it is an issue and I'm not suggesting you change it, but the number you would get from the histogram would be the number of timed out items during the last measurement interval whereas the original "Total Connections Timed Out" looks like it would have been the running total since the process started. If you have access to all measurement intervals since the app started you could always sum them up, but its possible you won't have access to all the historical measurement data or the added complexity of doing that summation discourages you from doing it. Still, I'm not worried because I doubt the running total was what users would have cared about. I assume it was the rate of change in the total that was important and that they can get readily from the histogram.
Yeah I saw that and I agree that changing it to dots is a good move. Historically it was always convention to name EventSources with dashes and then somewhere around .NET Core 3.0 Vance wanted to change the naming convention and start using dots instead. I assume Microsoft-AspNetCore-Server-Kestrel got named using the older naming convention rather than the newer one but I don't know any of the specifics about exactly why. Perhaps just confusion over which convention to follow when there is more than one.
Yep, I think that is the path we are on! We just need to continue adding Meter support to other portions of the stack. In .NET 8 dotnet-monitor should support Meters well so the only significant Microsoft tool I am aware of that still supports EventCounters but not Meters is the AppInsights SDK. As folks move to the newer OTel SDK that gap should diminish in importance. |
|
I'm not familiar with web sockets. @Tratcher @BrennanConroy?
QUIC still has the concept of a connection - https://http3-explained.haxx.se/en/quic/quic-connections. The connection counter tracks that. Multiplexed streams on a connection aren't counted. They're tracked as HTTP requests. |
@JamesNK upgraded connections should be a dimension on current-requests. |
Technically no, upgrades can be used for other things. In reality, websockets are the only real usage though. |
There is overhead in tags. Too many create a cardinality explosion and also adds to the traffic cost of sending/receiving them. I've tried to limit tags to the core metadata. Is that information covered in the |
Updated:
|
Typo: "hisogram" |
API Review Note:
API Approved! Final meters, counters, tags and their associated information is in the issue body. |
Added in #46834 |
Updated based on changes in #48536. |
Background and Motivation
ASP.NET Core has event counters. In .NET 8 we want to add metrics counters. These will sit side-by-side with event counters for backward compatibility.
Metrics counters add new features (histograms, tags) that allow data to be represented by fewer counters. For example, there are event counters in hosting for
total-requests
andfailed-requests
counters. One metrics counter can represent these with a tag to represent the status.Proposed API
Microsoft.AspNetCore.Hosting
Notes: HTTP counters and tags here follow OTel's lead: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/http-metrics.md#http-server
http-server-current-requests
http-server-current-requests
{request}
method
GET
;POST
;HEAD
scheme
http
;https
host
localhost
port
8080
http-server-request-duration
http-server-request-duration
s
scheme
http
;https
method
GET
;POST
;HEAD
status-code
200
protocol
HTTP/1.1
;HTTP/2
;HTTP/3
host
localhost
port
8080
route
{controller}/{action}/{id?}
exception-name
ExceptionHandlerMiddleware
orDeveloperExceptionPageMiddleware
.System.OperationCanceledException
IHttpMetricsTagsFeature
.organization
=contoso
Microsoft.AspNetCore.Server.Kestrel
Notes: All Kestrel counters include the endpoint as a tag.
kestrel-current-connections
kestrel-current-connections
{connection}
endpoint
localhost:8080
kestrel-connection-duration
kestrel-connection-duration
s
endpoint
localhost:8080
exception-name
IConnectionMetricsTagsFeature
.organization
=contoso
kestrel-rejected-connections
kestrel-rejected-connections
{connection}
endpoint
localhost:8080
kestrel-queued-connections
kestrel-queued-connections
{connection}
endpoint
localhost:8080
kestrel-queued-requests
kestrel-queued-requests
{request}
endpoint
localhost:8080
kestrel-current-upgraded-connections
kestrel-current-upgraded-connections
{request}
endpoint
localhost:8080
kestrel-tls-handshake-duration
kestrel-tls-handshake-duration
{s}
endpoint
localhost:8080
protocol
Tls10
;Tls11
;Tls12
;Tls13
exception-name
System.OperationCanceledException
kestrel-current-tls-handshakes
kestrel-current-tls-handshakes
{handshake}
endpoint
localhost:8080
Microsoft.AspNetCore.Http.Connections
Notes: Timed out connection counter is merged into connection-duration counter. I'm unaware of an official connection closed status, so I invented one with some values. @BrennanConroy It would be good if you could help with better end statuses.
signalr-http-transport-current-connections
signalr-http-transport-current-connections
{connection}
signalr-http-transport-current-transports
signalr-http-transport-current-transports
{transport}
transport
None
;WebSockets
;ServerSentEvents
;LongPolling
Update: REMOVED
signalr-http-transport-connection-duration
signalr-http-transport-connection-duration
s
status
NormalClosure
;Timeout
;AppShutdown
transport
None
;WebSockets
;ServerSentEvents
;LongPolling
Usage Examples
Alternative Designs
Risks
The text was updated successfully, but these errors were encountered: