[NativeAOT] Enable async runtime suspension and return hijacking on unix-arm64 #73216

VSadov · 2022-08-02T08:06:04Z

Contributes to: #67805

VSadov · 2022-08-04T23:43:27Z

/azp run runtime-extra-platforms

VSadov · 2022-08-04T23:49:25Z

I think this is ready to review/merge.

azure-pipelines · 2022-08-05T00:07:27Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas · 2022-08-05T01:08:06Z

src/coreclr/nativeaot/Runtime/EHHelpers.cpp

@@ -184,6 +184,7 @@ EXTERN_C void REDHAWK_CALLCONV RhpFailFastForPInvokeExceptionCoop(intptr_t PInvo
                                                                  void* pExceptionRecord, void* pContextRecord);
 int32_t __stdcall RhpVectoredExceptionHandler(PEXCEPTION_POINTERS pExPtrs);

+// REVIEW: this is no longer used by pInvokes and use in hijack seems bogus. Remove?


This was meant to protect against SEH exceptions entering the managed code and leaving it in inconsistent state. We should re-introduce this protection. Could you please create an issue on it and link it from here?

If PInvoke throws SEH exception, it is going unwind the managed portion of the stack and the runtime is going to be left in inconsistent state currently.

Makes sense, I will log an issue.
But we do not need this for hijack probes, right? There is no scenario that I can think of where we expect exceptions from waiting for GC.

It may be there so that fatal crashes like stackoverflow or access violations caused by process state corruptions terminate the process immediately instead letting it to propagate.

logged: #73429

jkotas · 2022-08-05T05:59:35Z

src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp

+
+// ldp with pre/post/no offset
+// x010 100x x1xx xxxx xxxx xxxx xxxx xxxx
+#define LDP_BITS2 0x28400000


I will trust you that you got these magic numbers right :-)

It took quite some time to construct and proof-read these.
The "BITS" parts for the individual instruction patterns ended up matching constants from instrarm64.h
And that is encouraging :-)

runtime/src/coreclr/jit/instrsarm64.h

Line 613 in 2201016

INST2(ldp, "ldp", LD, IF_EN2E, 0x29400000, 0x28400000)

jkotas

LGTM. Thank you!

janvorli

LGTM, thank you!

janvorli · 2022-08-05T00:28:18Z

src/coreclr/nativeaot/Runtime/threadstore.cpp

+                // we could be catching threads in restartable sequences such as LL/SC style interlocked on ARM64
+                // and forcing them to restart.
+                // if interrupt mechanism is fast, eagerness could be hurting our overall progress.
+                waitCycles += 10000;


Should we really keep this growing without any limit?

Ultimately the wait is unbounded, since suspension cannot gracefully fail. - "It is done, when it is done".
Practically, we can have very good timings here.

Both waiting too long and waiting not enough between re-hijacking can make the whole thing last longer.
Here I observed that we would interrupt a thread doing LL/SC InterlockedSomething loop, interrupting would invalidate its monitor, so SC would fail and start over - just in time for us to interrupt it again - leading to very long hangs.
Fixed spin counts are often a bad idea as the guesstimate may be wrong when running on a different platform. I just made the spin count to adjust in a naive way for now.

Dealing with this loop is the next/last part in the NativeAOT suspension work item. #67805

I plan to make this similar to what CoreCLR does:

interrupt threads and hijack/suspend accordingly

spin-check without re-hijacking as we are making progress - the common case is everything suspends quickly.

otherwise wait for progress with a 1 msec timeout, and if timed out, try hijacking again - to deal with remaining strugglers.

Makes sense

VSadov · 2022-08-05T22:35:55Z

Thanks!!

On ARM64 Linux. See if #73216 helped with the hang.

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Aug 2, 2022

ghost assigned VSadov Aug 2, 2022

Enable async runtime suspension and return hijacking on unix-arm64

94bed19

VSadov force-pushed the arm64susp branch from 79b39e1 to 94bed19 Compare August 2, 2022 08:31

fix unix-x64 build

fbfbd43

runfoapp bot mentioned this pull request Aug 4, 2022

system.net.security.tests.negotiateauthenticationkerberostest.loopback_success #73343

Closed

new way of epilog detection

d6b307a

VSadov mentioned this pull request Aug 4, 2022

Implement full GC suspension #67805

Closed

9 tasks

dotnet deleted a comment from azure-pipelines bot Aug 4, 2022

VSadov marked this pull request as ready for review August 4, 2022 23:48

VSadov requested a review from MichalStrehovsky as a code owner August 4, 2022 23:48

VSadov requested review from janvorli and jkotas August 4, 2022 23:48

actually wait for GC

a804b14

dotnet deleted a comment from azure-pipelines bot Aug 5, 2022

jkotas reviewed Aug 5, 2022

View reviewed changes

VSadov mentioned this pull request Aug 5, 2022

[NativeAOT] reintroduce the use of personality routines in PInvoke stubs #73429

Open

jkotas reviewed Aug 5, 2022

View reviewed changes

jkotas approved these changes Aug 5, 2022

View reviewed changes

This was referenced Aug 5, 2022

Infra improvements for Helix #68176

Closed

GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32 #73247

Closed

Removed REVIEW comment as we now have a tracking issue

d7257e1

janvorli approved these changes Aug 5, 2022

View reviewed changes

VSadov merged commit 85c411e into dotnet:main Aug 5, 2022

VSadov deleted the arm64susp branch August 5, 2022 22:35

MichalStrehovsky added a commit that referenced this pull request Aug 7, 2022

[NativeAot] Try re-enabling S.C.Concurrent tests

5668bd5

On ARM64 Linux. See if #73216 helped with the hang.

MichalStrehovsky mentioned this pull request Aug 7, 2022

[NativeAot] Try re-enabling S.C.Concurrent tests #73526

Merged

MichalStrehovsky added a commit that referenced this pull request Aug 8, 2022

[NativeAot] Try re-enabling S.C.Concurrent tests (#73526)

0c5ca95

On ARM64 Linux. See if #73216 helped with the hang.

ghost locked as resolved and limited conversation to collaborators Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NativeAOT] Enable async runtime suspension and return hijacking on unix-arm64 #73216

[NativeAOT] Enable async runtime suspension and return hijacking on unix-arm64 #73216

VSadov commented Aug 2, 2022 •

edited

Loading

VSadov commented Aug 4, 2022

VSadov commented Aug 4, 2022

azure-pipelines bot commented Aug 5, 2022

jkotas Aug 5, 2022

VSadov Aug 5, 2022

jkotas Aug 5, 2022

VSadov Aug 5, 2022

jkotas Aug 5, 2022

VSadov Aug 5, 2022

jkotas left a comment

janvorli left a comment

janvorli Aug 5, 2022

VSadov Aug 5, 2022 •

edited

Loading

janvorli Aug 5, 2022

VSadov commented Aug 5, 2022

[NativeAOT] Enable async runtime suspension and return hijacking on unix-arm64 #73216

[NativeAOT] Enable async runtime suspension and return hijacking on unix-arm64 #73216

Conversation

VSadov commented Aug 2, 2022 • edited Loading

VSadov commented Aug 4, 2022

VSadov commented Aug 4, 2022

azure-pipelines bot commented Aug 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas left a comment

Choose a reason for hiding this comment

janvorli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Aug 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov commented Aug 5, 2022

VSadov commented Aug 2, 2022 •

edited

Loading

VSadov Aug 5, 2022 •

edited

Loading