Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuaternionTests failures in release/5.0 on ARM64 CoreCLR Checked Alpine. #41108

Closed
danmoseley opened this issue Aug 20, 2020 · 17 comments
Closed
Labels
arch-arm64 area-System.Numerics blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Milestone

Comments

@danmoseley
Copy link
Member

danmoseley commented Aug 20, 2020

release/5.0

https://dev.azure.com/dnceng/public/_build/results?buildId=779018&view=ms.vss-test-web.build-test-results-tab&runId=24416968&resultId=149262&paneView=debug

net5.0-Linux-Release-arm64-CoreCLR_checked-(Alpine.312.Arm64.Open)Ubuntu.1804.ArmArch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.12-helix-arm64v8-20200602002604-25f8a3e

Quaternion.Slerp did not return the expected value: expected {X:NaN Y:NaN Z:NaN W:0.9848077} actual {X:NaN Y:NaN Z:NaN W:NaN}\nExpected: True\nActual:   False


Stack trace
   at System.Numerics.Tests.QuaternionTests.QuaternionSlerpTest() in /_/src/libraries/System.Numerics.Vectors/tests/QuaternionTests.cs:line 405
Quaternion.Lerp did not return the expected value: expected {X:NaN Y:NaN Z:NaN W:0.9848077} actual {X:NaN Y:NaN Z:NaN W:NaN}\nExpected: True\nActual:   False


Stack trace
   at System.Numerics.Tests.QuaternionTests.QuaternionLerpTest() in /_/src/libraries/System.Numerics.Vectors/tests/QuaternionTests.cs:line 75
Matrix4x4.Matrix4x4(Quaternion) did not return the expected value.\nExpected: True\nActual:   False


Stack trace
   at System.Numerics.Tests.Matrix4x4Tests.Matrix4x4FromQuaternionTest1() in /_/src/libraries/System.Numerics.Vectors/tests/Matrix4x4Tests.cs:line 1410

Runfo Tracking Issue: Runtime QuaternionTests tests

Build Definition Kind Run Name

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
0 0 0
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Numerics untriaged New issue has not been triaged by the area owner labels Aug 20, 2020
@ghost
Copy link

ghost commented Aug 20, 2020

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

@danmoseley
Copy link
Member Author

I don't see relevant changes in Quaternion or the tests ...

Can't see the history here -- @dotnet/dnceng why does this query not bring up the results above? Is there a delay?

Jobs
| where Started > ago(5d)
| join kind=inner TestResults on JobId
| where Result == "Fail"
| where Type1 startswith "System.Numerics"
| where Message contains "NaN" or StackTrace contains "QuaternionTests"

@MattGal
Copy link
Member

MattGal commented Aug 20, 2020

Any delay in Kusto results is a few minutes, I think that join just only finds jobs. Checking out the TestResults table for that failure, it made it into Kusto at 2020-08-19 20:41:56.6987374 and the work item finished running 33 seconds prior.

@danmoseley
Copy link
Member Author

danmoseley commented Aug 20, 2020

Maybe I need lunch then, as this doesn't find them either

TestResults
| where Type == "System.Numerics.Tests.QuaternionTests"
| where Result  == "Fail"
| where StackTrace contains "Slerp"

@MattGal
Copy link
Member

MattGal commented Aug 20, 2020

TestResults
| where Result == "Fail"
| where Type == "System.Numerics.Tests.QuaternionTests"
| where StackTrace contains "slerp"

gives me 76 results. Perhaps you need to reconnect to the cluster? (Edit: To be clear your version does too, I just hacked up what I already had to match)

@danmoseley
Copy link
Member Author

Right, I get the 76 but they are all "EntryPointNotFoundException".

@MattGal
Copy link
Member

MattGal commented Aug 20, 2020

Perhaps I've not eaten enough. What are you looking for? I'd generally start where it failed once in the way you're interested in (getting its work item id) and go from there to figure out the disconnect.

@Chrisboh
Copy link
Member

I have also noticed that everything listed in that kusto query happened on the same day. Could to be that the other pipelines aren't publishing the data to Kusto?

@danmoseley
Copy link
Member Author

What I was trying to achieve was: find out when the failure shown above started, and whether it's in master as well. I'm trying to learn how to do these things so I don't have to ask people to figure it out for me 😺

@tannergooding
Copy link
Member

The test wasn't disabled was it? I don't see the failure in any recent runs.

@tannergooding tannergooding added this to the Future milestone Sep 14, 2020
@tannergooding
Copy link
Member

Doesn't appear to be disabled.

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Sep 14, 2020
@am11
Copy link
Member

am11 commented Sep 27, 2020

Failed again on checked coreclr Linux_musl arm64 Release leg: logs; with the same value shown above:

  Starting:    System.Numerics.Vectors.Tests (parallel test collections = on, max threads = 4)
    System.Numerics.Tests.QuaternionTests.QuaternionSlerpTest [FAIL]
      Quaternion.Slerp did not return the expected value: expected {X:NaN Y:NaN Z:NaN W:0.9848077} actual {X:NaN Y:NaN Z:NaN W:NaN}
      Expected: True
      Actual:   False
      Stack Trace:
        /_/src/libraries/System.Numerics.Vectors/tests/QuaternionTests.cs(405,0): at System.Numerics.Tests.QuaternionTests.QuaternionSlerpTest()

@jkotas jkotas reopened this Sep 27, 2020
@jkotas jkotas added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Sep 27, 2020
@jkotas jkotas modified the milestones: Future, 6.0.0 Sep 27, 2020
@jaredpar
Copy link
Member

The failure identified by @am11 is the only one I see in the last 30 days.

https://runfo.azurewebsites.net/search/tests/?bq=definition%3Aruntime+started%3A%7E30&tq=QuaternionSlerpTest

@tannergooding
Copy link
Member

Looking at the test https://github.com/dotnet/runtime/blob/master/src/libraries/System.Numerics.Vectors/tests/QuaternionTests.cs#L393-L411, this shouldn't ever produce NaN (and MathHelper.Equal isn't even setup to handle NaN).

If someone can indicate which Linux distro we are using for the musl runs, I can try and see if I can get a repro or spot anything in the disassembly.
As it is now, I'd guess we are either using a bad register or loading an invalid value from memory.

@jkotas
Copy link
Member

jkotas commented Sep 30, 2020

Given that this is very intermittent failure, it may be a bug in the thread suspension for GC corrupting registers.

@am11
Copy link
Member

am11 commented Sep 30, 2020

It happened on Alpine Linux 3.12 aarch64. The Helix CI container instance can be ran locally with:

$ docker run -it mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.12-helix-arm64v8-20200602002604-25f8a3e

@directhex
Copy link
Contributor

No failures since October. I'm closing, since it seems happy now. Yes I know it was kinda intermittent, but the gap between the previous "it's fine, closing" and failure, and the gap between the last failure and today, are pretty big.

I'm sure this comment will come back and bite me eventually

@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Numerics blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

No branches or pull requests

9 participants