Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start/stop drivers concurrently on Server #8525

Closed
titusfortner opened this issue Jul 15, 2020 · 20 comments
Closed

Unable to start/stop drivers concurrently on Server #8525

titusfortner opened this issue Jul 15, 2020 · 20 comments
Assignees
Labels

Comments

@titusfortner
Copy link
Member

titusfortner commented Jul 15, 2020

💥 Regression Report

Getting errors when starting more than one driver session at a time from different Threads. Extra browsers are started, so I'm not sure if it failing on starting the 3rd one or if it breaks when trying to quit the session. The server appears to lose track of one session after another has been started.

Last working Selenium version

3.141.59

Stopped working in version:

4.x

To Reproduce

This is the broken spec:
https://github.com/SeleniumHQ/selenium/blob/dd7090cab54beb2be2541a08746ddaeb8783c496/rb/spec/integration/selenium/webdriver/spec_support/shared_examples/concurrent_driver.rb

./go //rb:remote-chrome-test

It's the same failure/stack track fro both firefox and chrome - https://travis-ci.org/github/titusfortner/selenium/jobs/708131143#L718

Expected behavior

More than one browser starts and then is correctly closed

Actual behavior

Here's the log of requests for sessions and the server not recognizing the sessions:
https://gist.github.com/titusfortner/a4076e1f3dc8ef4115acc7a84fde9bce

It includes messages like:
invalid session id and NoSuchSessionException

Environment

OS: All
Browser: All
Language Bindings version: Ruby trunk
Selenium Grid version (if applicable): trunk

@AutomatedTester
Copy link
Member

I think that you're going to need to help us investigate it more.

Bazel runs tests in parallel by default so wondering if there is something else going on here.

@titusfortner
Copy link
Member Author

I should have at least included logs when I filed this, so I added a gist above.

The test is starting the sessions from different threads.

Note that it is Chromedriver complaining about the session id, so it's like the server is mixing up which Session ID is associated with which driver.

Interestingly, when I use the most recently created session first, that one seems to work properly, but then the next two fail with the same invalid session id: https://gist.github.com/titusfortner/d8896b8fbd1d3ca2ceb9d55411af8aa9

@barancev
Copy link
Member

How to reproduce the issue? I can't find rb_server_toggle branch on github and I can't see it merged to trunk.

@titusfortner
Copy link
Member Author

Yes, it got merged in; It can be replicated with: ./go //rb:remote-chrome-test
We've guarded and are tracking this bug fo it in trunk, so you can see the results here: https://travis-ci.com/github/SeleniumHQ/selenium/jobs/361722656#L732 and here: https://travis-ci.com/github/SeleniumHQ/selenium/jobs/361722657#L729

Let me know if there is more info I can provide.

@shs96c shs96c self-assigned this Jul 21, 2020
@shs96c shs96c added this to the 4.0 milestone Jul 21, 2020
@shs96c
Copy link
Member

shs96c commented Jul 21, 2020

This should be fixed by a3e0daf. @titusfortner can you please confirm, and reopen the issue if there's still a problem?

@shs96c shs96c closed this as completed Jul 21, 2020
@titusfortner
Copy link
Member Author

@shs96c
Haven't been able to reproduce the problem with firefox, and of course this doesn't work with Safari
But Chrome, both Linux & Mac are having regular issues with this.
From Travis: https://travis-ci.com/github/SeleniumHQ/selenium/jobs/363687203#L741

The 2 errors I'm consistently getting locally (the first looks more helpful for diagnosing):
https://gist.github.com/titusfortner/5d290b31c8015b3ef96c5761f7510cb5

@titusfortner titusfortner reopened this Jul 22, 2020
@AutomatedTester
Copy link
Member

Having a quick look at this, I think

From Travis: https://travis-ci.com/github/SeleniumHQ/selenium/jobs/363687203#L741

is unrelated.

I have also tried to run tests concurrently with a server and can't replicate this issue with python bindings.

I am going to remove the Se4 tag for now until we can get down to the nitty gritty of the issue (which I will leave to you @titusfortner )

@AutomatedTester AutomatedTester removed this from the 4.0 milestone Aug 24, 2020
@titusfortner
Copy link
Member Author

@bjuric
Copy link

bjuric commented Sep 12, 2020

I ran into the exact same issue when I tried upgrading Selenium from 3.141.59 to the latest 4.x (4.0.0-alpha-6) on my gwen-web project. Multiple parallel sessions and switching between windows is not working in 4.x. Happy to help test this again if you have discovered the cause and have a fix.

@titusfortner
Copy link
Member Author

@bjuric what language are you using?

@bjuric
Copy link

bjuric commented Sep 12, 2020 via email

@titusfortner
Copy link
Member Author

@bjuric do you have reproducible code, so we have more to go off of than just my Ruby code?

@bjuric
Copy link

bjuric commented Sep 13, 2020

@titusfortner Here is a gist which replicates the issue with the java impl: https://gist.github.com/bjuric/ccf7dbb5546d9c323c29d2b8de6ff6ff

If you remove the fluent wait line, the parallel execution works correctly. Likely an issue with fluent wait when there are multiple web driver instances running at the same time.

@bjuric
Copy link

bjuric commented Sep 18, 2020

@titusfortner Turns out my issue was to do with fluent wait not using my custom thread pool (service executor). I've raised a PR here to fix that: #8713

@shs96c
Copy link
Member

shs96c commented Sep 18, 2020

The java code appears to be working as intended....

@bjuric
Copy link

bjuric commented Sep 19, 2020

Interesting, I was able reproduce the problem using Java 8 (oracle and openjdk) on two machines; a mac with 4 threads and a custom PC with 12 threads. On the former it failed immediately, but on the latter it worked the first time, but then failed for all subsequent runs. Applying the #8713 PR fix above and passing the executor to fluent wait, fixed all instances of the same tests and passed consistently on every launch and ran faster too.

@bjuric
Copy link

bjuric commented Sep 23, 2020

One way I've found to work around this (that feels a little dirty tbh) is to set the following system property to force the CompletableFuture in FluentWait to run on the calling thread.

java.util.concurrent.ForkJoinPool.common.parallelism=0

The asynchronous nature of waits etc now in Selenium 4 come at a cost to concurrency it would seem.

@titusfortner
Copy link
Member Author

@shs96c this issue is what is causing the Ruby tests to fail. Since it is crashing the browser and breaking the rest of the test execution, I'm going to completely block this test from running. (just making a note of it on this issue).

To reiterate, I can't reproduce this on my Mac, and I don't have a local linux environment handy to check it against, but it was an issue on Travis & now GitHub. This same code works just fine with the 3.141.59 server, just not with the alpha.

@titusfortner
Copy link
Member Author

So, I uncommented the guard in #9147 so we can look at what is going on right now. I can't reproduce it on my mac, and I don't have a Linux machine handy to investigate.

https://github.com/SeleniumHQ/selenium/runs/1836774444?check_suite_focus=true#step:8:263

It's properly starting the 1st session, getting title, closing it. The other 2 are requested but the code never hears back, and it throws a net read timeout error:
https://github.com/SeleniumHQ/selenium/runs/1836774444?check_suite_focus=true#step:8:603

Firefox has same behavior: https://github.com/SeleniumHQ/selenium/runs/1836774400?check_suite_focus=true

titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
titusfortner added a commit to titusfortner/selenium that referenced this issue Feb 11, 2021
@titusfortner
Copy link
Member Author

This issue had to do with the fact that CI tools only have 2 processors and new grid only allows one browser per processor. The way the tests were starting browsers concurrently required all 3 sessions to start before any of them could be used.

I fixed it by hard coding only 2 sessions for use on CI tools, which isn't ideal but works.
fwiw, I don't think this is ideal behavior, so might need to open another issue to address root cause.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants