Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: detect chrome browser process and tab crashes to no longer hang in CI #24338

Merged
merged 8 commits into from
Oct 21, 2022

Conversation

mjhenkes
Copy link
Member

@mjhenkes mjhenkes commented Oct 21, 2022

User facing changelog

  • When a chromium based browser tab or process crashes, Cypress will no longer hang indefinitely but close fail the current test and move on to the next.

Additional details

We are detecting tab crashes when cdp tells us it crashed and process crashes when the process exits with specific code/signal combinations.

This affects run mode much more than open mode. In run mode we will move on to the next spec. In open mode we will re-launch the browser to the home screen if it crashes.

Steps to test

I've created to specs that use CDP to crash the browser process and tab in the system tests e2e project. These work for any chromium based browser. This fix 'may' work for firefox but we have no way to test to see what signal/code combination we should be handling.

How has the user experience changed?

Browser tab crash - before: process hangs indefinitely

tab.hang.-.before.mov

Browser tab crash - after: process fails current spec and moves on to the next.

tab.hang.mov

Browser process crash - before: process hangs indefinitely

process.crash.-.before.mov

Browser process crash - after: process fails current spec and moves on to the next.

process.crash.-.after.mov

PR Tasks

  • Have tests been added/updated?
  • Has the original issue (or this PR, if no issue exists) been tagged with a release in ZenHub? (user-facing changes only)
  • Has a PR for user-facing changes been opened in cypress-documentation?
  • Have API changes been updated in the type definitions?

@cypress-bot
Copy link
Contributor

cypress-bot bot commented Oct 21, 2022

Thanks for taking the time to open a PR!

@cypress
Copy link

cypress bot commented Oct 21, 2022



Test summary

45945 0 4498 0Flakiness 19


Run details

Project cypress
Status Passed
Commit dd3ec67
Started Oct 21, 2022 7:36 PM
Ended Oct 21, 2022 7:53 PM
Duration 17:40 💡
OS Linux Debian - 11.4
Browser Multiple

View run in Cypress Dashboard ➡️


Flakiness

cypress/e2e/e2e/origin/cookie_behavior.cy.ts Flakiness
1 ... > same site / cross origin > XMLHttpRequest > sets cookie on same-site request if withCredentials is true, and attaches to same-site request if withCredentials is true
2 ... > same site / cross origin > fetch > sets same-site cookies if "include" credentials option is specified from request, but does not attach same-site cookies to request by default (same-origin)
3 ... > same site / cross origin > XMLHttpRequest > sets cookie on same-site request if withCredentials is true, and attaches to same-site request if withCredentials is true
4 ... > same site / cross origin > fetch > sets same-site cookies if "include" credentials option is specified from request, but does not attach same-site cookies to request by default (same-origin)
5 ... > same site / cross origin > XMLHttpRequest > sets cookie on same-site request if withCredentials is true, and attaches to same-site request if withCredentials is true
This comment includes only the first 5 flaky tests. See all 19 flaky tests in the Cypress Dashboard.

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

@mjhenkes mjhenkes self-assigned this Oct 21, 2022
@@ -570,7 +570,6 @@ export const AllCypressErrors = {
This can happen for a number of different reasons:

- You wrote an endless loop and you must fix your own code
- There is a memory leak in Cypress (unlikely but possible)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this to not first plant the idea that the user isn't at fault. They may not be at fault, but we'd like more investigation before they reach that conclusion.

options.onBrowserClose()
browserLauncher.clearInstanceState()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we clear the instance state here because we don't 'kill' the browser since it's already dead.

@@ -103,7 +103,9 @@ export class ProjectBase<TServer extends Server> extends EE {
this.options = {
report: false,
onFocusTests () {},
onError () {},
onError (error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a change to ensure that we're logging errors in open mode and not losing them or logging them twice.

@@ -146,9 +146,10 @@ export = {
onCrashed () {
const err = errors.get('RENDERER_CRASHED')

errors.log(err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default 'onError' is defined in open mode, but it was defined as an empty function. this would lead to duplicate logging of errors in run mode. I modified the default onError to log errors and removed this call to remove the duplicate logs.

@@ -0,0 +1,4 @@
it('crashes the chrome process', () => {
Cypress.automation('remote:debugger:protocol', { command: 'Browser.crash', params: {} })
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses cdp to crash the browser process

@@ -0,0 +1,4 @@
it('crashes the chrome tab', () => {
Cypress.automation('remote:debugger:protocol', { command: 'Page.navigate', params: { url: 'chrome://crash', transitionType: 'typed' } })
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses cdp to crash the browser tab

@mjhenkes mjhenkes marked this pull request as ready for review October 21, 2022 15:04
@mjhenkes mjhenkes changed the title Matth/fix/detect chrome browser renderer crashes fix: detect chrome browser process and tab crashes to no longer hang in CI Oct 21, 2022
@ryanthemanuel ryanthemanuel self-requested a review October 21, 2022 15:06
@AtofStryker AtofStryker self-requested a review October 21, 2022 15:08
@@ -570,7 +570,6 @@ export const AllCypressErrors = {
This can happen for a number of different reasons:

- You wrote an endless loop and you must fix your own code
- There is a memory leak in Cypress (unlikely but possible)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this no longer a concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this to not first plant the idea that the user isn't at fault. They may not be at fault, but we'd like more investigation before they reach that conclusion.

ctx.browser.setBrowserStatus('closed')
// TODO: make this a required property
if (!options.onBrowserClose) throw new Error('onBrowserClose did not exist in interactive mode')

const browserDisplayName = instance?.browser?.displayName || 'unknown'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Electron be the fallback here or when would we not know this value? User defied browser instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure in what cases we don't have the display name, just following typescripts lead to be honest. If we don't have a browser or a display name available i honestly don't know what could be happening.

Co-authored-by: Emily Rohrbough <emilyrohrbough@users.noreply.github.com>

// We are being very narrow on when to restart the browser here. The only case we can reliably test the 'SIGTRAP' signal.
// We want to avoid adding signals in here that may intentionally be caused by a user.
// For example exiting firefox through either force quitting or quitting via cypress will fire a 'SIGTERM' event which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quitting via cypress will fire a 'SIGTERM' event which

Is this quitting like a hard kill app quit? I notice an errors randomly this when we click the close button or ctrl + c in the terminal for Cypress unexpectedly killing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sigterm is what you get when you hard kill the app. Specifically in this case we hard kill firefox. But we can't use the sigterm signal because when we force exit firefox via the app, we get cause the sigterm signal and you end up in a loop where you can never kill firefox because we restart it.

@@ -186,6 +186,8 @@ export class OpenProject {
return await browsers.connectToNewSpec(browser, { onInitializeNewBrowserTab, ...options }, automation)
Copy link
Member

@emilyrohrbough emilyrohrbough Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update connectToNewSpec as well to create a new browser instance if one doesn't exist? Currently the browser instance doesn't exist, browsers.connectToNewSpec will see this isn't defined and throw:

  async connectToNewSpec (browser: Browser, options: BrowserNewTabOpts, automation: Automation) {
    debug('connecting to new chrome tab in existing instance with url and debugging port', { url: options.url })

    const browserCriClient = this._getBrowserCriClient()

    if (!browserCriClient) throw new Error('Missing browserCriClient in connectToNewSpec')

Copy link
Member Author

@mjhenkes mjhenkes Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectToNewSpec isn't run because 'options.shouldLaunchNewTab' is false, so we fall into the browser.open flow

Copy link
Member

@emilyrohrbough emilyrohrbough Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this get reset to false? I thought this was set to true when it wasn't the first spec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume in onError which end up calling 'onEarlyExit' in run.ts. Theres a lot of redirections.

Co-authored-by: Ryan Manuel <ryanm@cypress.io>
Copy link
Contributor

@AtofStryker AtofStryker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave the system tests a run on Mac and Windows. looks like everything behaves the way I would expect!

// It should fail the chrome_tab_crash spec, but the simple spec should run and succeed
context('when the tab crashes in chrome', () => {
systemTests.it('fails', {
browser: 'chrome',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add one for edge too? Guessing no since we treat chrome/edge the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, edge and chrome behave the same, no harm in testing it though 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently edge isn't available in system tests? 🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjhenkes Its likely because we don't run test on a window instance in CI

@mjhenkes mjhenkes merged commit 382e0b5 into develop Oct 21, 2022
@mjhenkes mjhenkes deleted the matth/fix/detect-chrome-browser-renderer-crashes branch October 21, 2022 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect browser/renderer crashes
4 participants