-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intrinsify Interlocked.And and Interlocked.Or on XARCH #96258
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCloses #32239 These were only implemented on ARM64 (and RISC-V). Unlike other Interlocked APIs, these return original value so it makes them impossible to optimize on XARCH when that original value is used. Looks like all usages of it in BCL don't need return value so we can optimize this case. void Test(ref int x, int y) => Interlocked.Or(ref x, y); Was: ; Method Program:Test(byref,int):this (FullOpts)
G_M55114_IG01: ;; offset=0x0000
push rax
;; size=1 bbWeight=1 PerfScore 1.00
G_M55114_IG02: ;; offset=0x0001
mov eax, dword ptr [rdx]
jmp SHORT G_M55114_IG03
align [0 bytes for IG03]
;; size=4 bbWeight=1 PerfScore 4.00
G_M55114_IG03: ;; offset=0x0005
mov ecx, eax
or ecx, r8d
mov dword ptr [rsp+0x04], eax
lock
cmpxchg dword ptr [rdx], ecx
mov ecx, dword ptr [rsp+0x04]
cmp eax, ecx
je SHORT G_M55114_IG05
;; size=21 bbWeight=8 PerfScore 174.00
G_M55114_IG04: ;; offset=0x001A
mov ecx, eax
mov eax, ecx
jmp SHORT G_M55114_IG03
;; size=6 bbWeight=4 PerfScore 10.00
G_M55114_IG05: ;; offset=0x0020
add rsp, 8
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 37 Now: ; Method Program:Test(byref,int):this (FullOpts)
G_M55114_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M55114_IG02: ;; offset=0x0000
lock
or dword ptr [rdx], r8d
;; size=4 bbWeight=1 PerfScore 16.00
G_M55114_IG03: ;; offset=0x0004
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 5
|
62cd091
to
4759e86
Compare
bebdc5c
to
988c8b7
Compare
@dotnet/jit-contrib @BruceForstall PTAL, simple PR, optimizes A few size and TP regressions for the cases when we previously didn't inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a comment in lsra.
Closes #32239
These were only implemented on ARM64 (and RISC-V). Unlike other Interlocked APIs, these return original value so it makes them impossible to optimize on XARCH when that original value is used. Looks like all usages of it in BCL don't need return value so we can optimize this case. Example:
runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Line 756 in 0cf461b
Was:
Now: