Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failures in merged tests don't get surfaced in AzDO #70306

Open
BruceForstall opened this issue Jun 6, 2022 · 7 comments
Open

Assertion failures in merged tests don't get surfaced in AzDO #70306

BruceForstall opened this issue Jun 6, 2022 · 7 comments

Comments

@BruceForstall
Copy link
Member

In this (GCStress) run:

https://dev.azure.com/dnceng/public/_build/results?buildId=1807767&view=ms.vss-test-web.build-test-results-tab&runId=48108678&resultId=109890&paneView=debug

The "coreclr Linux arm Checked gcstress0xc" leg has failures in the merged Methodical tests, with VM asserts. Those tests don't appear in the AzDO UI (as individual tests), linked above.

For example, Methodical_do fails with:

15:21:19.717 Running test: JIT/Methodical/Arrays/lcs/lcsvalbox_do/lcsvalbox_do.cmd

Assert failure(PID 56 [0x00000038], Thread: 68 [0x0044]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /__w/1/s/src/coreclr/vm/object.cpp Line: 522
    Image: /root/helix/work/correlation/corerun

[createdump] Gathering state for process 56 corerun
[createdump] Crashing thread 00000044 signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.56.dmp
[createdump] Written 122314752 bytes (29862 pages) to core file
[createdump] Dump successfully written
JIT/Methodical/Methodical_do/Methodical_do.sh: line 382:    56 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Expected: 100
Actual: 134
END EXECUTION - FAILED
+ export _commandExitCode=1

but "lcsvalbox_do" doesn't show up on the top-level UI (only "Methodical_do" shows up).

Is that expected?

@trylek @jkoritzinsky

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 6, 2022
@ghost
Copy link

ghost commented Jun 6, 2022

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

In this (GCStress) run:

https://dev.azure.com/dnceng/public/_build/results?buildId=1807767&view=ms.vss-test-web.build-test-results-tab&runId=48108678&resultId=109890&paneView=debug

The "coreclr Linux arm Checked gcstress0xc" leg has failures in the merged Methodical tests, with VM asserts. Those tests don't appear in the AzDO UI (as individual tests), linked above.

For example, Methodical_do fails with:

15:21:19.717 Running test: JIT/Methodical/Arrays/lcs/lcsvalbox_do/lcsvalbox_do.cmd

Assert failure(PID 56 [0x00000038], Thread: 68 [0x0044]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /__w/1/s/src/coreclr/vm/object.cpp Line: 522
    Image: /root/helix/work/correlation/corerun

[createdump] Gathering state for process 56 corerun
[createdump] Crashing thread 00000044 signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.56.dmp
[createdump] Written 122314752 bytes (29862 pages) to core file
[createdump] Dump successfully written
JIT/Methodical/Methodical_do/Methodical_do.sh: line 382:    56 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Expected: 100
Actual: 134
END EXECUTION - FAILED
+ export _commandExitCode=1

but "lcsvalbox_do" doesn't show up on the top-level UI (only "Methodical_do" shows up).

Is that expected?

@trylek @jkoritzinsky

Author: BruceForstall
Assignees: -
Labels:

area-Infrastructure-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

Also, once a test fails with an assert, what happens to the rest of the tests that would follow? Do they get executed? Do we skip them, and fail everything when we hit the first failure?

@trylek
Copy link
Member

trylek commented Jun 6, 2022

I believe that today we're unable to recover from a native assertion failure within the single process i.e. once a test crashes with an assertion failure in the runtime or JIT, the remaining tests in the merged group don't run. Perhaps in some cases it would be possible to ignore the assertion and continue execution but I'm not sure what exact rules govern this.

@BruceForstall
Copy link
Member Author

One thing that our PMI tool does to deal with this is:

  1. "PMI DriveAll" spawns a process to do "PMI RunAll" to run the tests.
  2. "PMI RunAll" writes a file every time it finishes a test, with the index of the finished test.
  3. If "PMI RunAll" crashes, "PMI DriveAll" notices, reads the "last tested" number, and reinvokes "PMI RunAll", telling it to start at the next one (so, skip the one that crashed).

Of course, there are issues like:

  1. Can you collect multiple dumps?
  2. Can you report multiple test failures / asserts

@trylek
Copy link
Member

trylek commented Jun 6, 2022

I believe this is doable in the merged wrapper context - it can be also modified to run itself as a child process, monitor test progress and restart the child process with a special form of filtering to exclude the previously run tests. @jkoritzinsky, is there any chance you might be able to add this support in the next couple of weeks or are you completely swamped with other work so that we need to find someone else to implement this logic?

@jkoritzinsky
Copy link
Member

I'm very swamped with other work right now and won't have time. Sorry.

@trylek
Copy link
Member

trylek commented Jun 6, 2022

Thanks Jeremy for your quick response. No worries, I'll discuss this in our team meeting and we'll find a way to close this gap.

@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Jul 11, 2022
@agocke agocke added this to the Future milestone Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants