-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT Optimization (Stack Usage?) #13814
Comments
Well, it does eliminate branches but this happens late enough that certain parts of the JIT end up seeing code like: HasFlag(int value, int flag) {
… *((long*)&value) | *((long*)&flag)
}
// or
HasFlag(int value, int flag) {
… *((byte*)&value) | *((byte*)&flag)
} Both result in variables that are address exposed/not enregistered. The first case is particularly bad as it's basically code with undefined behavior and I'm not sure what the JIT could do to handle it better. Define "better" in the context of "undefined behavior". For better or worse, the JIT is not a template metaprogramming engine. |
Perhaps there's an optimization pass that can be done after branch elimination to recognize patterns like this? |
@mikedn btw, the second case from your comment (the one without UB) even without the branch elimination seems could have a better codegen: static unsafe int HasFlag(int value, int flag)
{
return *((byte*) &value) | *((byte*) &flag);
} codegen: movzx rax, byte ptr [rsp+08H]
movzx rdx, byte ptr [rsp+10H]
or eax, edx could be just |
More likely the other way around - attempt to eliminate such branches as early as possible. BTW, storing the size in a variable is a bad idea and will likely make eliminating such branches early impossible. You should just write |
Yes, that's pretty easy to fix. The UB case is the main problem, that's why I mentioned it. I suppose since it's UB you could just convert |
Actually, it seems that you could just use public static bool Test<T>(T value, T flag) where T : unmanaged, Enum
{
if (sizeof(T) == 1)
return (Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flag)) == Unsafe.As<T, byte>(ref value);
else if (sizeof(T) == 2)
return (Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flag)) == Unsafe.As<T, short>(ref value);
else if (sizeof(T) == 4)
return (Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flag)) == Unsafe.As<T, int>(ref value);
else // size == 8
return (Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flag)) == Unsafe.As<T, long>(ref value);
} generates the expected code: G_M37086_IG02:
0BD1 or edx, ecx
3BD1 cmp edx, ecx
0F94C0 sete al
0FB6C0 movzx rax, al |
I only stored it intermittently because I didn't see any change in the output or performance and the code was cleaner that way, but noted. I was playing around with Lots of libraries and existing code that contains generics uses |
I think you have to add |
yes, but it doesn't make any sense :| there must be a feature request to Roslyn to get rid of this requirement. |
Yeah, I happened to dump that method in a test class that was already unsafe and didn't notice. Though it's not really unsafe, |
+1 for getting rid of restrictions on just using |
I think I can close this issue given the solution we've come to with
Thoughts? |
It looks like less restrictive In any case, these are issues for csharplang. As far as the JIT is concerned, I think it would be reasonable to expect at least better handling of trivial methods like And while |
Slighty related - this generates optimal ASM for the public static unsafe T SetFlags<T>(this T value, T flags) where T : unmanaged
{
if (sizeof(T) == 1) {
byte v = (byte)(Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flags));
return Unsafe.As<byte, T>(ref v);
}
else if (sizeof(T) == 2) {
short v = (short)(Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flags));
return Unsafe.As<short, T>(ref v);
}
else if (sizeof(T) == 4) {
int v = Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flags);
return Unsafe.As<int, T>(ref v);
}
else { // size == 8
long v = Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flags);
return Unsafe.As<long, T>(ref v);
}
}
public static byte SetFlags(byte value, byte flags)
{
return (byte)(value | flags);
} First method creates this for byte:
Versus this for the second method:
|
Is looks like you're still using |
@mikedn Fixed the code sample, just pasted in some stuff that I was playing with and forgot to change it back. |
Fixed the asm output as well, had the wrong runtime selected. |
Ah, of course. Small integer types are more likely to cause issues because local variables and method of parameters of such types tend to be treated as So it looks like even with better dead code elimination it may still be useful to handle better this kind of reinterpretation. Fortunately this is relatively easy to fix, I already have some code that handles |
Thanks, This looks nice :) and way shorter than the code I use in SpanJson for the same problem, I need to track the underlying type of the enum for the code below to work. public static bool HasFlag<TEnum, TEnumBase>(TEnum a, TEnum b) where TEnum: struct, Enum where TEnumBase : struct
{
if (typeof(TEnumBase) == typeof(sbyte))
{
var aConv = Unsafe.As<TEnum, sbyte>(ref a);
var bConv = Unsafe.As<TEnum, sbyte>(ref b);
return (aConv & bConv) == bConv;
}
// this repeats for every possible base type. Fwiw, Why are all the HasFlag/Test example source codes above written with OR instead of AND? |
@Tornhoof I'm not sure what you mean. |
I understand it's equivalent, I just haven't seen the OR version used before. That's why I asked. |
I don't know, that's just what my brain happened to produce, haha :) |
Using |
On the surface, it seems like the JIT shouldn't have a problem producing the optimal code here as it is doing the proper branch elimination but that is not what's happening. Take the following method:
With the branch elimination it is doing it should be able to produce the optimal ASM, i.e.:
It gets somewhat close but produces this instead (for size 4 types):
Which runs about half as fast and uses 16 additional bytes (26 instead of just 10). I'm not sure exactly what is going on here but I believe it has to do with the way the stack is used when there are multiple branches that use locals causing some optimizations not to be done because this produces the optimal ASM:
I imagine that improving this could reduce JITted code size and bump performance in many cases. One of the concerns about adding generic enum methods is the JIT bloat it will produce and this could potentially help with that given that most of the methods would be simple bitwise operations like the one above.
category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: