Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve production error logging #906

Closed
ggwadera opened this issue Jun 24, 2021 · 11 comments
Closed

Improve production error logging #906

ggwadera opened this issue Jun 24, 2021 · 11 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed td tech debt

Comments

@ggwadera
Copy link
Collaborator

ggwadera commented Jun 24, 2021

We currently use Sentry to capture errors on the production deployment but it seems it's not doing a very good job because we're mostly clueless whenever a bug happens to show up. I don't know if it's a configuration issue with Sentry or we're just not doing things correctly to send the logs.

Either way, it's worth to take some time to read the docs to better understand this and try to fix, so we can have more reliable logging for bug fixing. Alternatives to Sentry are also welcome if it's reasonable enough to warrant a migration.

Another thing is that logs from dev environments are also being sent, which leads to some confusion when going over the errors on Sentry. Might have to add a condition to only init Sentry when running on production.

c0d3-app/pages/_app.tsx

Lines 20 to 22 in d35f563

Sentry.init({
dsn: SENTRY_DSN
})

So basically, this is what is desired to be improved:

  • Disable error reporting outside of production environment
  • Fix all the logs that aren't being sent from the production deployment
  • Include the serverless function errors logs in the report, or setup a log drain on Vercel

Some useful resources:

@ggwadera ggwadera added enhancement New feature or request help wanted Extra attention is needed td tech debt labels Jun 24, 2021
@tomrule007
Copy link
Contributor

tomrule007 commented Jun 25, 2021

I would like to take a stab at the desired improvements.

  • Disable error reporting outside of production environment
    ✓ Created 3 seperate sentry project: vercel-production, vercel-preview & local-dev. Select project by setting NEXT_PUBLIC_SENTRY_DSN & SENTRY_PROJECT environment variables. Vercel variables are already set but must manually add to local setup if you wish to test local logging
  • Fix all the logs that aren't being sent from the production deployment
    ✓ Probably still missing some but I believe most error reporting is being sent correctly
  • Include the serverless function errors logs in the report, or setup a log drain on Vercel
    ✓ serverless functions are included with a temporary manual hack and log drain is setup with logflare

@ggwadera
Copy link
Collaborator Author

Great, can I assign you to this issue then? Just so we know that there's someone already working on it.

@tomrule007
Copy link
Contributor

Ya put me on this one

@tomrule007
Copy link
Contributor

tomrule007 commented Jun 27, 2021

So the client side error reporting in dev environment is coming from the next.config.js file.

c0d3-app/next.config.js

Lines 20 to 21 in 4e90693

process.env.SENTRY_DSN ||
'https://e95626afb0454145b569bc69116f838c@o385150.ingest.sentry.io/5221680',

It seems like a lot of these defaults are maybe stale? and should not be hidden in this config file either way. If the user needs them they should explicitly have to set them in there .env file. We can copy the defaults that should be there to our .env.example file.

Currently that sentry_dna key is pointing to a seperate dev_sandbox project which I think is nice to have for our engineering team to be able to test out sentry functions on there local machine (ie we should keep it and just move it to the private engineering key list)

Something like this:
https://github.com/tomrule007/c0d3-app/pull/1/files

Inspired from: https://12factor.net/config

@ggwadera
Copy link
Collaborator Author

Just as an additional information, adding the environment variables on next.config.js is no longer required for them to work, see the little note at the start here https://nextjs.org/docs/api-reference/next.config.js/environment-variables.

If you want you can remove the sentry values from there when you send a PR for this issue.

@tomrule007
Copy link
Contributor

I've got a working setup for my dev environment that is reporting both frontend and backend errors but I'm still working on getting the config setup that will also work on our vercel host.

It also appears there maybe some issues with the sentry package that will prevent it from working with our serverless setup on vercel.
getsentry/sentry-javascript#3643

@tomrule007
Copy link
Contributor

tomrule007 commented Jun 30, 2021

Issue Status Report

The main issue with the old setup is it only sent backend errors that we manual caught and forwarded with our logger.ts helper middleware utility.

This not only caused a lot of errors to go unnoticed it also seems to not be playing well with the vercel serverless functions and muddies up the stack trace with all the logging function calls.

This new setup uses the @sentry/nextjs package that injects an error catcher first thing into the frontend and the backend to catch all errors when they are created.

Though it still require the withSentry helper function to wrap all our request handlers but should work well with our serverless setup once there issues are handled.

The big bonus of this sentry.io setup is:

  • Vercel integration automatically tags each release with a version number that can be tracked to commits and logs when errors first show up.
  • Vercel integration uploads the source map files allowing sentry.io to display actual code that caused the error
  • Sentry io can also further be integrated with github to link directly to the commits that caused the problem

Test Branch

Created test PR #922 with a live vercel preview branch hosted here:
https://c0d3-app-8ha8m1w94-c0d3.vercel.app/

Test branch includes:

  • Throw error button in top left corner that will throw a test frontend error (can be viewed in console.log)
  • /api/error route to throw a test backend error ( server will respond with internal error)

Engineering team can login and view the error reports:

Screenshots for those that don't have login:

Sentry frontend error report
sentry_frontend_error

Sentry backend error report
sentry_backend_error

Vercel server function log
vercel_server_log

Changes

  • Replaced the general sentry.io sdk with specific nextjs sdk that handles frontend/backend splitting at compile time. @sentry/browser --> @sentry/nextjs.
  • used npx @sentry/wizard -i nextjs to initialize config files sentry.server.config.js / sentry.client.config.js and sentry.properties file which is moved into .env file (and auto handled by vercel integration)
  • Also temp disabled coverage report to stop complaints about new mock error throwers (will be removed before merged)

To-Do

  • Add separate preview branch environment variable to separate preview logs from production logs
    ✓ Created vercel-preview project on sentry.io and updated vercel preview branch environment variables SENTRY_PROJECT / NEXT_PUBLIC_SENTRY_DSN
  • Ensure backend is reporting all errors on serverless (@sentry/nextjs currently has a lot of issues open with serverless not functioning correctly)
    ✓ works with test error and will track this sentry/nextjs issue for any updates
  • Remove mock error throwers
  • Enable pages coverage reports
  • Add withSentry wrapper to all apis (lessons / graphql)
  • Change telemetry % gathering to reduce data consumptions and slow downs (or disable, currently logs all transactions and creates a performance report)
    ✓ Leaving at 100%
  • Make withSentry wrapper work with our custom LoggedRequest type
    (If Anyone knows how to solve this I would love some help on this one)

    ✓ Removed LoggedRequest from api/lessons as it was unneeded
  • Possibly remove current winston-transport-sentry-node * It's using an old sentry package and breaking the apps ability to setup preformance monitoring
    ✓ Removed package and and set winston to just log to console

@ggwadera
Copy link
Collaborator Author

Oh damn that's some awesome stuff. Didn't even know sentry could do all that. Nice work 💪

@tomrule007
Copy link
Contributor

tomrule007 commented Jul 1, 2021

So this issue has snowballed to be much bigger than I initially intended.

Since apollo has its own error handler that catches and process the error you have to make an Apollo Plugin that links into there error event handler. Luckily there was a blog post for this that I was basically able to copy paste.

Blog: https://blog.sentry.io/2020/07/22/handling-graphql-errors-using-sentry
plugin api info: https://www.apollographql.com/docs/apollo-server/integrations/plugins/

But there is also still an issue with Sentry.flush() function that sometimes fails (there are multiple on going active issues on this / serverless hosts not working properly)

Remaining Todo

  • Check functionality on staging server
  • Add Typescript types for apolloSentryErrorPlugin
  • Add Tests for apolloSentryErrorPlugin

@tomrule007
Copy link
Contributor

tomrule007 commented Jul 2, 2021

And flush is not working =/ I feel like I am in a bit of stand still until this sentry serverless fail to flush issue gets solved.
Preview Link

[POST] /api/graphql
17:09:39:56
Flush attempt: [
  Error: AuthenticationError: Password is invalid
      at login (/var/task/.next/server/chunks/6901.js:177:11) {
    locations: [ [Object] ],
    path: [ 'login' ]
  }
]
Flush failed

tomrule007 added a commit to tomrule007/c0d3-app that referenced this issue Jul 2, 2021
@ggwadera ggwadera pinned this issue Jul 2, 2021
@tomrule007
Copy link
Contributor

Ok so I have a working pr going now! 895bb39

Solution: Manually check if sentry client is running and manually call Sentry.init() if not

Concerns: The sentry packages should be doing this for us and maybe currently creating extra errors that they are suppressing. The manual initializing may lead to some performance issues? (IE getsentry/sentry-javascript#3051)

I did a quick (not thorough) investigation into performance issues. I did 10 bad password requests. To our live code app vs live preview branch

Results

Preview c0d3.com Diff
418 ms 200 ms +218 ms
260 ms 215 ms +45 ms
255 ms 204 ms +51 ms
410 ms 207 ms +203 ms
413 ms 201 ms +212 ms
270 ms 204 ms +66 ms
247 ms 192 ms +55 ms
408 ms 202 ms +206 ms
409 ms 214 ms +195 ms
254 ms 199 ms +55 ms
min 247 ms 192 ms +55 ms
max 418 ms 215 ms +203 ms
avg 334.4 ms 203.8 ms +130.6 ms

Extra for engineering team:

  • Vercel logs
    * Note it only shows current log so you have to create an error (invalid user or invalid password) while the log window is open
  • Sentry Logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed td tech debt
Projects
None yet
Development

No branches or pull requests

2 participants