-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AggressiveInlining to a couple of Vector.Create methods #71889
Conversation
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsFixes #71885 static Vector128<byte> CreateWithROSpan() => Vector128.Create("0123456789ABCDEF"u8); Was: ; Assembly listing for method D:CreateWithROSpan():System.Runtime.Intrinsics.Vector128`1[System.Byte]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; ReadyToRun compilation
; optimized code
G_M10759_IG01: ;; offset=0000H
56 push rsi
4883EC30 sub rsp, 48
33C0 xor eax, eax
4889442420 mov qword ptr [rsp+20H], rax
488BF1 mov rsi, rcx
;; size=15 bbWeight=1 PerfScore 2.75
G_M10759_IG02: ;; offset=000FH
488D0D00000000 lea rcx, [(reloc 0x4000000000420060)]
48894C2420 mov bword ptr [rsp+20H], rcx
C744242810000000 mov dword ptr [rsp+28H], 16
488BCE mov rcx, rsi
488D542420 lea rdx, [rsp+20H]
FF1500000000 call [System.Runtime.Intrinsics.Vector128:Create(System.ReadOnlySpan`1[System.Byte]):System.Runtime.Intrinsics.Vector128`1[System.Byte]]
488BC6 mov rax, rsi
;; size=37 bbWeight=1 PerfScore 6.50
G_M10759_IG03: ;; offset=0034H
4883C430 add rsp, 48
5E pop rsi
C3 ret
;; size=6 bbWeight=1 PerfScore 1.75
; Total bytes of code 58, prolog size 12, PerfScore 16.80, instruction count 15, allocated bytes for code 58 (MethodHash=107ed5f8) for method D:CreateWithROSpan():System.Runtime.Intrinsics.Vector128`1[System.Byte]
; ============================================================ Now:
|
Thanks for the quick fix! While at it, feel free to update HexConverter.cs#L99-L107 to use |
We may want to look at making them intrinsic in .NET 8. The "equivalents" are intrinsic for |
@tannergooding Are you sure those are intrinsics for private static ReadOnlySpan<byte> data => new []
{
(byte)'0', (byte)'1', (byte)'2', (byte)'3',
(byte)'4', (byte)'5', (byte)'6', (byte)'7',
(byte)'8', (byte)'9', (byte)'A', (byte)'B',
(byte)'C', (byte)'D', (byte)'E', (byte)'F',
(byte)'0', (byte)'1', (byte)'2', (byte)'3',
(byte)'4', (byte)'5', (byte)'6', (byte)'7',
(byte)'8', (byte)'9', (byte)'A', (byte)'B',
(byte)'C', (byte)'D', (byte)'E', (byte)'F',
};
static Vector<byte> CreateWithROSpan() => new Vector<byte>(data); dump: https://gist.github.com/EgorBo/564520f9abfa8bc6da5103dc3a6606c9
leaving it for you 🙂 |
Ah, probably the |
The optimization is certainly turned on for Release configuration:
VectorCreation/obj/Release/net7.0/linux-x64/native/VectorCreation.ilc.rsp from repro I posted)
|
Yes, but there is a difference between Same applies to crossgen2 |
Ah nice, I missed that. With Dump of assembler code for function VectorCreation_D__CreateWithROSpan:
0x0000000000148410 <+0>: push rbp
0x0000000000148411 <+1>: mov rbp,rsp
0x0000000000148414 <+4>: lea rax,[rip+0x1e8f15] # 0x331330
0x000000000014841b <+11>: movups xmm0,XMMWORD PTR [rax]
0x000000000014841e <+14>: movups XMMWORD PTR [rsi],xmm0
0x0000000000148421 <+17>: mov rax,rsi
0x0000000000148424 <+20>: pop rbp
0x0000000000148425 <+21>: ret
End of assembler dump. but the |
The difference is that |
Fixes #71885
These methods aren't intrinsics and when the more conservative inliner is used (for AOT in non --Ot mode, which is default) it fails to inline them.
Was:
Now:
cc @tannergooding