-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: detect chrome browser process and tab crashes to no longer hang in CI #24338
fix: detect chrome browser process and tab crashes to no longer hang in CI #24338
Conversation
Thanks for taking the time to open a PR!
|
@@ -570,7 +570,6 @@ export const AllCypressErrors = { | |||
This can happen for a number of different reasons: | |||
|
|||
- You wrote an endless loop and you must fix your own code | |||
- There is a memory leak in Cypress (unlikely but possible) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this to not first plant the idea that the user isn't at fault. They may not be at fault, but we'd like more investigation before they reach that conclusion.
options.onBrowserClose() | ||
browserLauncher.clearInstanceState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we clear the instance state here because we don't 'kill' the browser since it's already dead.
@@ -103,7 +103,9 @@ export class ProjectBase<TServer extends Server> extends EE { | |||
this.options = { | |||
report: false, | |||
onFocusTests () {}, | |||
onError () {}, | |||
onError (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a change to ensure that we're logging errors in open mode and not losing them or logging them twice.
@@ -146,9 +146,10 @@ export = { | |||
onCrashed () { | |||
const err = errors.get('RENDERER_CRASHED') | |||
|
|||
errors.log(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default 'onError' is defined in open mode, but it was defined as an empty function. this would lead to duplicate logging of errors in run mode. I modified the default onError to log errors and removed this call to remove the duplicate logs.
@@ -0,0 +1,4 @@ | |||
it('crashes the chrome process', () => { | |||
Cypress.automation('remote:debugger:protocol', { command: 'Browser.crash', params: {} }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses cdp to crash the browser process
@@ -0,0 +1,4 @@ | |||
it('crashes the chrome tab', () => { | |||
Cypress.automation('remote:debugger:protocol', { command: 'Page.navigate', params: { url: 'chrome://crash', transitionType: 'typed' } }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses cdp to crash the browser tab
@@ -570,7 +570,6 @@ export const AllCypressErrors = { | |||
This can happen for a number of different reasons: | |||
|
|||
- You wrote an endless loop and you must fix your own code | |||
- There is a memory leak in Cypress (unlikely but possible) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this no longer a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this to not first plant the idea that the user isn't at fault. They may not be at fault, but we'd like more investigation before they reach that conclusion.
ctx.browser.setBrowserStatus('closed') | ||
// TODO: make this a required property | ||
if (!options.onBrowserClose) throw new Error('onBrowserClose did not exist in interactive mode') | ||
|
||
const browserDisplayName = instance?.browser?.displayName || 'unknown' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would Electron be the fallback here or when would we not know this value? User defied browser instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not sure in what cases we don't have the display name, just following typescripts lead to be honest. If we don't have a browser or a display name available i honestly don't know what could be happening.
Co-authored-by: Emily Rohrbough <emilyrohrbough@users.noreply.github.com>
|
||
// We are being very narrow on when to restart the browser here. The only case we can reliably test the 'SIGTRAP' signal. | ||
// We want to avoid adding signals in here that may intentionally be caused by a user. | ||
// For example exiting firefox through either force quitting or quitting via cypress will fire a 'SIGTERM' event which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quitting via cypress will fire a 'SIGTERM' event which
Is this quitting like a hard kill app quit? I notice an errors randomly this when we click the close button or ctrl + c in the terminal for Cypress unexpectedly killing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sigterm is what you get when you hard kill the app. Specifically in this case we hard kill firefox. But we can't use the sigterm signal because when we force exit firefox via the app, we get cause the sigterm signal and you end up in a loop where you can never kill firefox because we restart it.
@@ -186,6 +186,8 @@ export class OpenProject { | |||
return await browsers.connectToNewSpec(browser, { onInitializeNewBrowserTab, ...options }, automation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update connectToNewSpec
as well to create a new browser instance if one doesn't exist? Currently the browser instance doesn't exist, browsers.connectToNewSpec
will see this isn't defined and throw:
async connectToNewSpec (browser: Browser, options: BrowserNewTabOpts, automation: Automation) {
debug('connecting to new chrome tab in existing instance with url and debugging port', { url: options.url })
const browserCriClient = this._getBrowserCriClient()
if (!browserCriClient) throw new Error('Missing browserCriClient in connectToNewSpec')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connectToNewSpec isn't run because 'options.shouldLaunchNewTab' is false, so we fall into the browser.open flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this get reset to false? I thought this was set to true when it wasn't the first spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume in onError which end up calling 'onEarlyExit' in run.ts. Theres a lot of redirections.
Co-authored-by: Ryan Manuel <ryanm@cypress.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave the system tests a run on Mac and Windows. looks like everything behaves the way I would expect!
// It should fail the chrome_tab_crash spec, but the simple spec should run and succeed | ||
context('when the tab crashes in chrome', () => { | ||
systemTests.it('fails', { | ||
browser: 'chrome', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add one for edge too? Guessing no since we treat chrome/edge the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, edge and chrome behave the same, no harm in testing it though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently edge isn't available in system tests? 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjhenkes Its likely because we don't run test on a window instance in CI
User facing changelog
Additional details
We are detecting tab crashes when cdp tells us it crashed and process crashes when the process exits with specific code/signal combinations.
This affects run mode much more than open mode. In run mode we will move on to the next spec. In open mode we will re-launch the browser to the home screen if it crashes.
Steps to test
I've created to specs that use CDP to crash the browser process and tab in the system tests e2e project. These work for any chromium based browser. This fix 'may' work for firefox but we have no way to test to see what signal/code combination we should be handling.
How has the user experience changed?
Browser tab crash - before: process hangs indefinitely
tab.hang.-.before.mov
Browser tab crash - after: process fails current spec and moves on to the next.
tab.hang.mov
Browser process crash - before: process hangs indefinitely
process.crash.-.before.mov
Browser process crash - after: process fails current spec and moves on to the next.
process.crash.-.after.mov
PR Tasks
cypress-documentation
?type definitions
?