Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EventSource/EventCounter tracing and metrics for Authentication #4665

Closed
analogrelay opened this issue Oct 30, 2017 · 14 comments
Closed
Labels
affected-medium This issue impacts approximately half of our customers area-auth Includes: Authn, Authz, OAuth, OIDC, Bearer enhancement This issue represents an ask for new feature or an enhancement to an existing one Needs: Design This issue requires design work before implementating. severity-minor This label is used by an internal tool
Milestone

Comments

@analogrelay
Copy link
Contributor

This is part of the larger EventCounter/EventSource effort across the platform. Specific events and counters are suggested below. Let the discussions begin! :)

Start/Stop/Failure events will all have Duration event counters associated with them that will time the duration between Start and Stop. Stop events occur even in the case of failure (think finally blocks).

  • Microsoft-AspNetCore-Authentication Event Source

    • RemoteAuthenticationFailure traces calls to RemoteAuthenticationEvents.RemoteFailure
    • TicketReceived traces calls to RemoteAuthenticationEvents.TicketReceived
    • HandleRequestStart/Stop/Failure traces IAuthenticationRequestHandler.HandleRequestAsync but only starts after ShouldHandleRequestAsync succeeds. Thus not tracing requests that are not handled by the handler.
  • Microsoft-AspNetCore-Authentication-Cookies Event Source

    • CookieRefreshed traces when CookieAuthenticationHandler.RequestRefresh is called. Payload includes issue time, the old expiration and the new expiration?
    • CookieExpired traces when an expired cookie token is received. Obviously this can't catch expiry of tokens until the expired token is actually presented to the server.
  • Microsoft-AspNetCore-Authentication-JwtBearer Event Source

    • TokenValidationFailed traces tokens that fail to validate
    • InvalidTokensReceived counter incremented when token validation fails
  • Microsoft-AspNetCore-Authentication-OAuth Event Source (shared by OAuth providers)

    • CorrelationFailed traces occasions where the correlation ID cannot be verified.
    • CorrelationFailures counter incremented when token validation fails
    • ExchangeCodeStart/Stop/Failure traces the backchannel request to exchange the code for a token
  • Microsoft-AspNetCore-Authentication-OpenIDConnect Event Source (shared by OIDC providers)

    • RedeemAuthorizationCodeStart/Stop/Failure traces the backchannel request to redeem an authorization code
    • TokenValidationFailed traces tokens that fail to validate
    • InvalidTokensReceived counter incremented when token validation fails
    • GetUserInformationStart/Stop/Failure traces the backchannel request to get user information

I want to avoid the whole "the perfect is the enemy of the good" issue and focus on a relatively simple set of initial events. So I focused on tracing external dependencies and possible malicious actions (invalid tokens). Once in, it should be easy to iterate and add more events.

Also, there's a broad category of AuthN-related events which will live in HttpAbstractions with AuthorizationService

See https://gist.github.com/anurse/af1859663ac91c6cf69c820cebe92303 for some guidance on adding EventSources and EventCounters to ASP.NET projects.

/cc @PinpointTownes @Tratcher @davidfowl

@HaoK
Copy link
Member

HaoK commented Oct 30, 2017

What's the general idea/goal behind what we choose to expose as events/counters? Is this going to be a subset of everything we log/trace?

@analogrelay
Copy link
Contributor Author

@HaoK yes, it is likely a subset of the things we log. In theory, we could make every log message an Event as well, but I think in practice that's a lot of work. Authentication isn't the highest priority area for this work, but I wanted to log what I've found so far and start tracking it since I was exploring this area already to sketch out the patterns we'd use for EventSource/EventCounters. Right now I'd say my focus is around taking high-value events and exposing them. In many cases, there are log message here already, but where there aren't I don't see why we can't expose both logs and events. High-value in this first pass came down to a few things:

  1. Tracing the duration of authentication requests (HandleRequestAsync)
  2. Tracing failures in token and correlation validation (since that generally implies a significant problem or malicious action)
  3. Tracing the duration of external dependencies

@Eilon
Copy link
Member

Eilon commented Nov 2, 2017

Also adding @leastprivilege @brockallen and @PinpointTownes for any additional thoughts on this based on what you have seen in the community and problems that customers have run into.

@Eilon
Copy link
Member

Eilon commented Nov 2, 2017

Parking on @Tratcher 's plate to noodle on this and come up with a final list.

@kevinchalet
Copy link
Contributor

Also adding @leastprivilege @brockallen and @PinpointTownes for any additional thoughts on this based on what you have seen in the community and problems that customers have run into.

I'm all for adding ETW tracing/metrics support but it's not super clear how this would concretely help customers with the problems they run into when using the security stack (except maybe... perf issues?).

EventSource is one of those super low-level things that are incredibly hard to configure and use, unlike most of the sinks offered by the logging stack. If you want to make the security components easier for customers, I think rewording the existing traces and adding new ones where appropriate would be a better response to this problem.

@Tratcher
Copy link
Member

Tratcher commented Nov 3, 2017

@PinpointTownes this is about tracking trends in your application, not diagnosing individual issues. There is an open question on what data would be meaningful in that context, the counters listed above are only suggestions to start the discussion.

For example a single error log from a remote callback is only noise, but when you see that suddenly 20% of logins result in errors then that alerts you to investigate and/or roll back recent changes. These counters can also be left on where constant logging would be too much overhead.

@analogrelay
Copy link
Contributor Author

@PinpointTownes The main goal of this effort is EventCounters, which don't have an analog in logging right now. But EventCounters run on top of EventSources so they end up coming in together. Also EventSource != ETW and we've been looking at ways to make EventSource data more accessible. There are also APM tools that already use it and libraries like EventFlow. We need to add EventSources to get the counters so it won't hurt to beef up our tracing across the board at the same time (ILogger, EventSource, etc.).

Also, this work should include putting ILogger traces in along with the EventSources unless it has some other negative impact.

@kevinchalet
Copy link
Contributor

Also EventSource != ETW and we've been looking at ways to make EventSource data more accessible.

Ah, I guess it changed recently 'cause when it was released a few years ago, it was pretty much tied to ETW (tho' you could register your own listeners, the internal APIs made direct calls to ETW and PerfView was the only tool able to collect the traces).

@PinpointTownes this is about tracking trends in your application, not diagnosing individual issues.

Thanks for clarifying. I think I got confused when @Eilon mentioned customers problems but your remarks made clearer the fact it's mainly used for statistics (pretty much like what we had in ASP.NET 4.x with perf counters, but for much more events 😅).

I took a brief look at the list and it looks good (even if it's indeed mostly for things that didn't go well).

Two remarks tho':

  • The correlation events are currently listed under Microsoft-AspNetCore-Authentication-OAuth. Since the OAuth2 social providers are not the only handlers that rely on correlation (OIDC, Twitter and the aspnet-contrib OpenID 2.0 provider also use it), you may want to move them to the more generic Microsoft-AspNetCore-Authentication provider, as the antiforgery APIs are part of RemoteAuthenticationHandler.

  • You used different verbs for the "authentication code redeemed" events for OAuth2 and OIDC (ExchangeCode* vs RedeemAuthorizationCode*). Since it corresponds to the same operation, you should use consistent names.

@analogrelay
Copy link
Contributor Author

Ah, I guess it changed recently 'cause when it was released a few years ago, it was pretty much tied to ETW (tho' you could register your own listeners, the internal APIs made direct calls to ETW and PerfView was the only tool able to collect the traces).

Not much has changed here, but the in-proc stuff is the place we plan to do some work improving. The API still directly writes to ETW but also to the in-proc listeners and it works fine on Linux (just not calling ETW). The thought here is that we can provide a better in-proc story for gathering and shipping this data.

@analogrelay
Copy link
Contributor Author

analogrelay commented Nov 3, 2017

pretty much like what we had in ASP.NET 4.x with perf counters, but for much more events

Catching up with ASP.NET 4.x perf counters is indeed the primary driving force behind this. The rest is nice-to-have :)

@Tratcher
Copy link
Member

Tratcher commented Nov 8, 2017

  • CookieExpired is potentially very noisy, the client may keep sending the same expired cookie, we don't preemptively delete it for them. I'm not sure this provides value beyond a counter for anonymous requests.
  • HandleRequest will directly overlap with all events that happen during a callback such as the OAuth and OIDC events. It may be redundant.

@aspnet-hello aspnet-hello transferred this issue from aspnet/Security Dec 13, 2018
@aspnet-hello aspnet-hello added this to the Backlog milestone Dec 13, 2018
@aspnet-hello aspnet-hello added area-auth Includes: Authn, Authz, OAuth, OIDC, Bearer Needs: Design This issue requires design work before implementating. labels Dec 13, 2018
@Tratcher Tratcher added affected-medium This issue impacts approximately half of our customers enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-minor This label is used by an internal tool labels Nov 10, 2020 — with ASP.NET Core Issue Ranking
@davidfowl
Copy link
Member

@JamesNK do we have a work item about adding metrics to the with stack or have we already done that ?

cc @halter73

@JamesNK
Copy link
Member

JamesNK commented Nov 27, 2024

It hasn't been done.

Issue here: #47603

@davidfowl
Copy link
Member

Closing this as dupe of #47603

@davidfowl davidfowl closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected-medium This issue impacts approximately half of our customers area-auth Includes: Authn, Authz, OAuth, OIDC, Bearer enhancement This issue represents an ask for new feature or an enhancement to an existing one Needs: Design This issue requires design work before implementating. severity-minor This label is used by an internal tool
Projects
None yet
Development

No branches or pull requests

8 participants