-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move SearchValues scalar loops into IndexOfAnyAsciiSearcher #91937
Conversation
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsThis PR moves the scalar loops into the core worker methods. This reduces the amount of code on each call site and makes it easier for us to make further changes like adding Avx512 support. The code that's inlined into Example call site diff ; Assembly listing for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Unix
; FullOpts code
G_M48088_IG01:
push rbp
- sub rsp, 16
- lea rbp, [rsp+0x10]
- ;; size=10 bbWeight=1 PerfScore 1.75
+ mov rbp, rsp
+ ;; size=4 bbWeight=1 PerfScore 1.25
G_M48088_IG02:
mov rdx, 0xD1FFAB1E ; const ptr
- mov rax, gword ptr [rdx]
- mov rdx, rax
- mov bword ptr [rbp-0x10], rdi
- mov dword ptr [rbp-0x04], esi
- cmp esi, 8
- jge SHORT G_M48088_IG04
- ;; size=28 bbWeight=1 PerfScore 5.75
-G_M48088_IG03:
- mov rdi, rdx
- mov rsi, bword ptr [rbp-0x10]
- mov edx, dword ptr [rbp-0x04]
- mov rax, 0xD1FFAB1E ; code for System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
- call [rax]System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
- jmp SHORT G_M48088_IG05
- ;; size=24 bbWeight=0.50 PerfScore 3.75
-G_M48088_IG04:
+ mov rdx, gword ptr [rdx]
add rdx, 8
- mov rdi, bword ptr [rbp-0x10]
- mov esi, dword ptr [rbp-0x04]
mov rax, 0xD1FFAB1E ; code for System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
call [rax]System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
- ;; size=23 bbWeight=0.50 PerfScore 2.75
-G_M48088_IG05:
nop
- ;; size=1 bbWeight=1 PerfScore 0.25
-G_M48088_IG06:
- add rsp, 16
+ ;; size=30 bbWeight=1 PerfScore 6.00
+G_M48088_IG03:
pop rbp
ret
- ;; size=6 bbWeight=1 PerfScore 1.75
+ ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 92, prolog size 10, PerfScore 25.20, instruction count 25, allocated bytes for code 92 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 36, prolog size 4, PerfScore 12.35, instruction count 10, allocated bytes for code 36 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts) Overall seems to be a slight improvement for Regex Regex benchmark resultsPerf_Regex_Industry_SliceSlice
Perf_Regex_Industry_RustLang_Sherlock
Perf_Regex_Industry_BoostDocs_Simple
Perf_Regex_Industry_Mariomkas
Perf_Regex_Industry_Leipzig
|
Do any of those benchmarks deal with really small inputs? That's where we'd expect to see a regression, right? |
Seeing numbers along the lines of
|
This PR moves the scalar loops into the core worker methods. This reduces the amount of code on each call site and makes it easier for us to make further changes like adding Avx512 support.
The code that's inlined into
IndexOfAny
callers is now just a call to the worker method instead oflength < 8 ? call scalar : call vectorized
.Example call site diff
Overall seems to be a slight improvement for Regex
Regex benchmark results
Perf_Regex_Industry_SliceSlice
Perf_Regex_Industry_RustLang_Sherlock
Perf_Regex_Industry_BoostDocs_Simple
Perf_Regex_Industry_Mariomkas
Perf_Regex_Industry_Leipzig