-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve "vec == Vector128<>.Zero" #75999
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFollow up to #75864 to address @TamarChristinaArm's suggestion in #75849 (comment) static bool IsZero1(Vector128<int> v) => v == Vector128<int>.Zero;
static bool IsZero2(Vector64<int> v) => v == Vector64<int>.Zero; Codegen diff: ; Method Tests:IsZero1
stp fp, lr, [sp, #-0x10]!
mov fp, sp
- umaxv s16, v0.4s
- umov w0, v16.s[0]
- cmp w0, #0
+ umaxp v16.4s, v0.4s, v0.4s
+ umov x0, v16.d[0]
+ cmp x0, #0
cset x0, eq
ldp fp, lr, [sp], #0x10
ret lr
; Method Tests:IsZero2
stp fp, lr, [sp, #-0x10]!
mov fp, sp
- umaxv h16, v0.4h
- umov w0, v16.s[0]
- cmp w0, #0
+ umov x0, v0.d[0]
+ cmp x0, #0
cset x0, eq
ldp fp, lr, [sp], #0x10
ret lr Should be a nice win for Vector64. For Vector128 I wasn't able to see noticeable improvements on my Apple M1 but we might see improvements on other (hopefully, on Ampere Altra?) However, even on M1 it seems to be better:
(1st column is Latency, according to https://dougallj.github.io/applecpu/firestorm-simd.html)
|
Oh, from what I see Latency on Ampere for |
The 6 cycles is I believe only for bytes, the zero special case I don't expect the same dramatic difference as it's reducing from |
@dotnet/jit-contrib @kunalspathak PTAL, previously the logic was enabled for Vector3 (SIMD12) too and that was wrong, I'll try to come up with a bug repro and backport a quick fix to net7.0 |
Ah, Vector3 is only used for floats so it won't pass the |
BlockRange().InsertBefore(node, cmp); | ||
LowerNode(cmp); | ||
GenTree* cmp = op; | ||
if (simdSize != 8) // we don't need compression for Vector64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment few lines above to have MaxPairWise
instead of MaxAcross
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Follow up to #75864 to address @TamarChristinaArm's suggestion in #75849 (comment)
Btw, previous improvements seem to show nice benefits #75864 (comment)
Codegen diff:
Should be a nice win for Vector64. For Vector128 I wasn't able to see noticeable improvements on my Apple M1 but we might see improvements on other (hopefully, on Ampere Altra?)
However, even on M1 it seems to be better:
(1st column is Latency, according to https://dougallj.github.io/applecpu/firestorm-simd.html)