-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random Segmentation fault arm32 Linux, OSX x64 without stress modes. #12769
Comments
I have an OSX box, let me see if I can repro... |
There were 3 OSX failures reported this month:
No luck reproing any of these. Switching over to Arm32. |
Reran the query and there are a spate of new Arm32 failures, all in threading tests. I am able to get the following test to fail, though it's not clear yet if this the same failure seen in CI. This happens every 50 runs or so, with a concurrent
Will try and capture the core dump... |
I can get @janvorli does this assert ring any bells? |
@AndyAyersMS my guess is that it might be caused by the same thing as the dotnet/coreclr#24879 that I've investigated yesterday. I am guessing based on the fact that it asserts due to something in the sync block. |
You can try to checkout sources at commit 2b08a11 and see if you can still repro it. |
|
Just double-checked and I am still running the "newer" bits in the above. So this still could be the same issue as in dotnet/coreclr#24879 -- failing stack looks similar. |
@VSadov if/when you have a fix for dotnet/coreclr#24879 we can also try validating on the following tests on arm32:
I'm going to consider all these failures to be instances of dotnet/coreclr#24879 and start looking at the other somewhat recent intermittent arm32 failures:
|
For this latest batch, no failures after 1000 runs each, under stress.
|
None of the one-off failures has reproed locally (with the exception of the arm32 threading cases listed above), or recurred in CI. |
Two new failures overnight:
Will see if I can repro either one. |
The fix for dotnet/coreclr#24879 has been merged. However it is unlikely to cause this. Here we have a mix of failures. Those that are asserts about alignpad are likely dotnet/coreclr#24879 and should be fixed now. |
Right, this issue tracks a number of failures. The failures that seem related to dotnet/coreclr#24879 are the ones that hit the following assert:
These did not repro when I backed out your original change. I will check and see if they still repro now that you have a fix. |
@VSadov don't see those arm32 failures anymore with your fix, so think it was the same issue. None of the other one-off failures here has recurred or repro'd locally. One more new failure on ubuntu arm32:
will try to repro. |
No repro on the above either. Things were quiet for the past 10 days or so, but we just got an new failure on OSX.
Will try and repro this one. |
No luck on that one either... |
No failures since 6/19, only one failure since 6/8, and only 5 not yet understood failures all month. None of the one-off failures have reproed, even with additional machine loading. I am going to close this, we can open new issues if any of these tests fails again. |
Example: rngchkStress2, 437017.
We do not have dumps currently and it is not clear how to repro the failures, so this issue probably requires infra work to set up dump publishing.
The queue to get similar failures:
Many of them are from baseservices/threading, but some of them are not.
PTAL @jashook, @echesakovMSFT, @dotnet/jit-contrib
The text was updated successfully, but these errors were encountered: