Assertion failures in merged tests don't get surfaced in AzDO #70306

BruceForstall · 2022-06-06T20:17:59Z

In this (GCStress) run:

https://dev.azure.com/dnceng/public/_build/results?buildId=1807767&view=ms.vss-test-web.build-test-results-tab&runId=48108678&resultId=109890&paneView=debug

The "coreclr Linux arm Checked gcstress0xc" leg has failures in the merged Methodical tests, with VM asserts. Those tests don't appear in the AzDO UI (as individual tests), linked above.

For example, Methodical_do fails with:

15:21:19.717 Running test: JIT/Methodical/Arrays/lcs/lcsvalbox_do/lcsvalbox_do.cmd

Assert failure(PID 56 [0x00000038], Thread: 68 [0x0044]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /__w/1/s/src/coreclr/vm/object.cpp Line: 522
    Image: /root/helix/work/correlation/corerun

[createdump] Gathering state for process 56 corerun
[createdump] Crashing thread 00000044 signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.56.dmp
[createdump] Written 122314752 bytes (29862 pages) to core file
[createdump] Dump successfully written
JIT/Methodical/Methodical_do/Methodical_do.sh: line 382:    56 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Expected: 100
Actual: 134
END EXECUTION - FAILED
+ export _commandExitCode=1

but "lcsvalbox_do" doesn't show up on the top-level UI (only "Methodical_do" shows up).

Is that expected?

@trylek @jkoritzinsky

The text was updated successfully, but these errors were encountered:

ghost · 2022-06-06T20:18:04Z

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

In this (GCStress) run:

https://dev.azure.com/dnceng/public/_build/results?buildId=1807767&view=ms.vss-test-web.build-test-results-tab&runId=48108678&resultId=109890&paneView=debug

The "coreclr Linux arm Checked gcstress0xc" leg has failures in the merged Methodical tests, with VM asserts. Those tests don't appear in the AzDO UI (as individual tests), linked above.

For example, Methodical_do fails with:

15:21:19.717 Running test: JIT/Methodical/Arrays/lcs/lcsvalbox_do/lcsvalbox_do.cmd

Assert failure(PID 56 [0x00000038], Thread: 68 [0x0044]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /__w/1/s/src/coreclr/vm/object.cpp Line: 522
    Image: /root/helix/work/correlation/corerun

[createdump] Gathering state for process 56 corerun
[createdump] Crashing thread 00000044 signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.56.dmp
[createdump] Written 122314752 bytes (29862 pages) to core file
[createdump] Dump successfully written
JIT/Methodical/Methodical_do/Methodical_do.sh: line 382:    56 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Expected: 100
Actual: 134
END EXECUTION - FAILED
+ export _commandExitCode=1

but "lcsvalbox_do" doesn't show up on the top-level UI (only "Methodical_do" shows up).

Is that expected?

@trylek @jkoritzinsky

Author:	BruceForstall
Assignees:	-
Labels:	`area-Infrastructure-coreclr`
Milestone:	-

BruceForstall · 2022-06-06T20:35:24Z

Also, once a test fails with an assert, what happens to the rest of the tests that would follow? Do they get executed? Do we skip them, and fail everything when we hit the first failure?

trylek · 2022-06-06T20:39:18Z

I believe that today we're unable to recover from a native assertion failure within the single process i.e. once a test crashes with an assertion failure in the runtime or JIT, the remaining tests in the merged group don't run. Perhaps in some cases it would be possible to ignore the assertion and continue execution but I'm not sure what exact rules govern this.

BruceForstall · 2022-06-06T20:50:18Z

One thing that our PMI tool does to deal with this is:

"PMI DriveAll" spawns a process to do "PMI RunAll" to run the tests.
"PMI RunAll" writes a file every time it finishes a test, with the index of the finished test.
If "PMI RunAll" crashes, "PMI DriveAll" notices, reads the "last tested" number, and reinvokes "PMI RunAll", telling it to start at the next one (so, skip the one that crashed).

Of course, there are issues like:

Can you collect multiple dumps?
Can you report multiple test failures / asserts

trylek · 2022-06-06T20:57:40Z

I believe this is doable in the merged wrapper context - it can be also modified to run itself as a child process, monitor test progress and restart the child process with a special form of filtering to exclude the previously run tests. @jkoritzinsky, is there any chance you might be able to add this support in the next couple of weeks or are you completely swamped with other work so that we need to find someone else to implement this logic?

jkoritzinsky · 2022-06-06T21:11:01Z

I'm very swamped with other work right now and won't have time. Sorry.

trylek · 2022-06-06T21:51:53Z

Thanks Jeremy for your quick response. No worries, I'll discuss this in our team meeting and we'll find a way to close this gap.

BruceForstall added the area-Infrastructure-coreclr label Jun 6, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Jun 6, 2022

BruceForstall mentioned this issue Jun 6, 2022

Optimize multi-dimensional array access #70271

Merged

agocke added this to Runtime Infra Jun 16, 2022

agocke removed the untriaged New issue has not been triaged by the area owner label Jul 11, 2022

agocke added this to the Future milestone Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion failures in merged tests don't get surfaced in AzDO #70306

Assertion failures in merged tests don't get surfaced in AzDO #70306

BruceForstall commented Jun 6, 2022

ghost commented Jun 6, 2022

BruceForstall commented Jun 6, 2022

trylek commented Jun 6, 2022

BruceForstall commented Jun 6, 2022

trylek commented Jun 6, 2022

jkoritzinsky commented Jun 6, 2022

trylek commented Jun 6, 2022

Assertion failures in merged tests don't get surfaced in AzDO #70306

Assertion failures in merged tests don't get surfaced in AzDO #70306

Comments

BruceForstall commented Jun 6, 2022

ghost commented Jun 6, 2022

BruceForstall commented Jun 6, 2022

trylek commented Jun 6, 2022

BruceForstall commented Jun 6, 2022

trylek commented Jun 6, 2022

jkoritzinsky commented Jun 6, 2022

trylek commented Jun 6, 2022