-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ShuffleNative
methods and optimise Shuffle
for non-constant indices
#99596
base: main
Are you sure you want to change the base?
Conversation
Note regarding the
|
Benchmark results of my AVX2 code ( Yes, this is a very micro benchmark, but results are pretty reproducible on my machine (within ~%10 usually), and are probably pretty close to reality since it should be pretty quick (but obviously this doesn't measure the overhead with surrounding code due to more pipeline usage, etc.). (edit: there was likely an issue with this benchmark turning into a no-op) |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this again.
We're already using the so-far-internal Vector128.ShuffleUnsafe
in a bunch of places. Should we be using Vector256.ShuffleUnsafe
somewhere?
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
It seems to me that all the current uses of |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs
Outdated
Show resolved
Hide resolved
Can someone please check I won't accidentally regress mono :) |
Mono changes look good to me. Thanks for your contribution. |
Re: 9868e73 |
I don't think there's any issue with the runtime relying on specific behaviour. For external libraries, I think one of the following approaches makes sense:
I think the approach needs to be consistent for all of them, so I removed the Another option, which I briefly mentioned in a comment somewhere, is to expose a variant like |
I'm fine with only documenting "anything above 15 is UB". |
Yes, I've been careful to not use the AVX-512 one for this method for this reason. I will add a comment at some point to explain this in the method (assuming I don't forget). |
- There's other places where they're not normalised. could be done, but not really worth the trouble probably
This comment was marked as resolved.
This comment was marked as resolved.
src/libraries/System.Private.CoreLib/src/System/SearchValues/SearchValues.cs
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minus a couple small comments and the behavior of the indirect invocation for ShuffleNative which we can log a bug around to figure out as a separate work item.
CC. @EgorBo, @dotnet/jit-contrib for secondary review. |
[CompExactlyDependsOn(typeof(Ssse3))] | ||
internal static Vector128<byte> ShuffleNativeModified(Vector128<byte> vector, Vector128<byte> indices) | ||
{ | ||
if (Ssse3.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here. no SSSE3 - different values/behaviour?
- Implement a bunch of feedback - Fix an oversight in `IsValidForShuffle` where we don't treat something as variable indices that gets emitted as such
Shuffle
with variable indices on coreclr (for all types)Shuffle
onVector256
(with signed/unsigned bytes and shorts)Vector256
shuffle withAvx2.Shuffle
(for signed/unsigned bytes and shorts)VectorXxx.Shuffle
with constant indices when not crossing lanes, when zeroing, etc.Todo tasks:
VectorXXX.ShuffleNative
for vectors of other element typesShuffle
inShuffleNative
fallbackUp-to-date codegen (note: the amount of cases I tested is perhaps a bit over the top, so check the c# code to see what cases you'd like to look at before looking through all the assembly - the main ones for non-constant indices are the
IndirectIndirect
ones; should include most improved scenarios): CodegenCodegen (outdated, some cases have gotten better, doesn't include codegen for all relevant platforms (e.g., AVX-512 and arm64), and doesn't include all improved scenarios):
Shuffle With AVX2
ShuffleUnsafe With AVX2
Shuffle With Sse4.2
ShuffleUnsafe With Sse4.2