Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT Optimization (Stack Usage?) #13814

Closed
mikernet opened this issue Nov 22, 2019 · 23 comments
Closed

JIT Optimization (Stack Usage?) #13814

mikernet opened this issue Nov 22, 2019 · 23 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage optimization
Milestone

Comments

@mikernet
Copy link
Contributor

On the surface, it seems like the JIT shouldn't have a problem producing the optimal code here as it is doing the proper branch elimination but that is not what's happening. Take the following method:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool HasFlag(T value, T flag) where T : struct, Enum
{			
    int size = Unsafe.SizeOf<T>();

    if (size == 1)
        return (Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flag)) == Unsafe.As<T, byte>(ref value);
    else if (size == 2)
        return (Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flag)) == Unsafe.As<T, short>(ref value);
    else if (size == 4)
        return (Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flag)) == Unsafe.As<T, int>(ref value);
    else // size == 8
        return (Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flag)) == Unsafe.As<T, long>(ref value);
}

With the branch elimination it is doing it should be able to produce the optimal ASM, i.e.:

    L0000: or edx, ecx
    L0002: cmp edx, ecx
    L0004: setz al
    L0007: movzx eax, al
    L000a: ret

It gets somewhat close but produces this instead (for size 4 types):

    L0000: mov [rsp+0x8], ecx
    L0004: mov [rsp+0x10], edx
    L0008: mov eax, [rsp+0x8]
    L000c: mov edx, [rsp+0x10]
    L0010: or edx, eax
    L0012: cmp edx, eax
    L0014: setz al
    L0017: movzx eax, al
    L001a: ret

Which runs about half as fast and uses 16 additional bytes (26 instead of just 10). I'm not sure exactly what is going on here but I believe it has to do with the way the stack is used when there are multiple branches that use locals causing some optimizations not to be done because this produces the optimal ASM:

public static bool HasFlag(T value, T flag) where T : struct, Enum
{			
    int size = Unsafe.SizeOf<T>();

    if (size == 4)
        return (Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flag)) == Unsafe.As<T, int>(ref value);
    
    throw new NotSupportedException();    
}

I imagine that improving this could reduce JITted code size and bump performance in many cases. One of the concerns about adding generic enum methods is the JIT bloat it will produce and this could potentially help with that given that most of the methods would be simple bitwise operations like the one above.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

as it is doing the proper branch elimination

Well, it does eliminate branches but this happens late enough that certain parts of the JIT end up seeing code like:

HasFlag(int value, int flag) {*((long*)&value) | *((long*)&flag)
}
// or 
HasFlag(int value, int flag) {*((byte*)&value) | *((byte*)&flag)
}

Both result in variables that are address exposed/not enregistered. The first case is particularly bad as it's basically code with undefined behavior and I'm not sure what the JIT could do to handle it better. Define "better" in the context of "undefined behavior".

For better or worse, the JIT is not a template metaprogramming engine.

@mikernet
Copy link
Contributor Author

Perhaps there's an optimization pass that can be done after branch elimination to recognize patterns like this?

@EgorBo
Copy link
Member

EgorBo commented Nov 22, 2019

@mikedn btw, the second case from your comment (the one without UB) even without the branch elimination seems could have a better codegen:

static unsafe int HasFlag(int value, int flag)
{
	return *((byte*) &value) | *((byte*) &flag);
}

codegen:

       movzx    rax, byte  ptr [rsp+08H]
       movzx    rdx, byte  ptr [rsp+10H]
       or       eax, edx

could be just or + movsx
If I understand it correctly.

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

Perhaps there's an optimization pass that can be done after branch elimination to recognize patterns like this?

More likely the other way around - attempt to eliminate such branches as early as possible. BTW, storing the size in a variable is a bad idea and will likely make eliminating such branches early impossible. You should just write else if (Unsafe.SizeOf<T>() == 4), though sadly that's not currently enough for the JIT to eliminate the branch.

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

btw, the second case from your comment (the one without UB) even without the branch elimination seems could have a better codegen:

Yes, that's pretty easy to fix. The UB case is the main problem, that's why I mentioned it. I suppose since it's UB you could just convert (*(long*)&i32) into (long)i32 and call it a day. Though attempts to make UB well defined may turn out to be counterproductive.

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

You should just write else if (Unsafe.SizeOf() == 4), though sadly that's not currently enough for the JIT to eliminate the branch.

Actually, it seems that you could just use sizeof:

public static bool Test<T>(T value, T flag) where T : unmanaged, Enum
{
    if (sizeof(T) == 1)
        return (Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flag)) == Unsafe.As<T, byte>(ref value);
    else if (sizeof(T) == 2)
        return (Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flag)) == Unsafe.As<T, short>(ref value);
    else if (sizeof(T) == 4)
        return (Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flag)) == Unsafe.As<T, int>(ref value);
    else // size == 8
        return (Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flag)) == Unsafe.As<T, long>(ref value);
}

generates the expected code:

G_M37086_IG02:
       0BD1                 or       edx, ecx
       3BD1                 cmp      edx, ecx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al

@mikernet
Copy link
Contributor Author

mikernet commented Nov 22, 2019

I only stored it intermittently because I didn't see any change in the output or performance and the code was cleaner that way, but noted.

I was playing around with unmanaged and sizeof before as well and didn't notice a difference so I left it at struct to avoid problems with mismatched constraints because a lot of current enum code (both mine and library code) is only restricted to struct but you are right, changing the current version to unmanaged and using sizeof does the trick.

Lots of libraries and existing code that contains generics uses where T : struct, Enum to constrain the type which creates a bit of a problem with this approach. Given that the two are effectively equivalent it might be prudent to treat them as such and allow the constraints to be compatible.

@mikernet
Copy link
Contributor Author

I think you have to add unsafe to that method signature for sizeof to work, right?

@EgorBo
Copy link
Member

EgorBo commented Nov 22, 2019

I think you have to add unsafe to that method signature for sizeof to work, right?

yes, but it doesn't make any sense :| there must be a feature request to Roslyn to get rid of this requirement.

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

I think you have to add unsafe to that method signature for sizeof to work, right?

Yeah, I happened to dump that method in a test class that was already unsafe and didn't notice. Though it's not really unsafe, sizeof is unfortunately unnecessarily restrictive in C#. It also responsible for the unmanaged constraint that too should not be necessary.

@mikernet
Copy link
Contributor Author

mikernet commented Nov 22, 2019

+1 for getting rid of restrictions on just using sizeof wherever you want. I never really understood why it is so restricted :/ I don't see how it's somehow better to create and require the use of Unsafe.SizeOf, especially when it can create issues for JIT optimization like the one above. Even if there's a reason to treat its use as unsafe, you could just require the unsafe keyword, but I don't see why I can't just use sizeof anywhere I want. If I'm really going to do something unsafe with the size then I will need an unsafe context anyway, but the sizeof operation itself isn't actually unsafe.

@mikernet
Copy link
Contributor Author

mikernet commented Nov 22, 2019

I think I can close this issue given the solution we've come to with unmanaged and sizeof, but I think a number of couple related issues have been raised here which might warrant being opened if they haven't already:

  1. Treat where T : struct, Enum as compatible with where T : unmanaged, Enum
  2. Allow sizeof to be used anywhere, potentially requiring unsafe where the type is not constrained to unmanaged

Thoughts?

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

It looks like less restrictive sizeof was discussed recently but that went nowhere: https://github.com/dotnet/csharplang/blob/2118a41673443f6898ba3086471cc04bb44f1cfd/meetings/2019/LDM-2019-08-28.md#allow-sizeof-in-safe-code

In any case, these are issues for csharplang.

As far as the JIT is concerned, I think it would be reasonable to expect at least better handling of trivial methods like SizeOf during inlining. The current amount of ceremony around inlining is pretty large and while that's normal for more complex methods, for trivial methods it's just mind boggling.

And while SizeOf and this particular usage may be a bit of an odd ball, there are tons of trivial methods (think trivial property getters and setters) for which the current inlining approach is overkill.

@mikernet
Copy link
Contributor Author

mikernet commented Nov 22, 2019

Slighty related - this generates optimal ASM for the int and long case but does not for the byte and short case for some reason, reverting to similar output as before:

public static unsafe T SetFlags<T>(this T value, T flags) where T : unmanaged
{
    if (sizeof(T) == 1) {
        byte v = (byte)(Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flags));
        return Unsafe.As<byte, T>(ref v);
    }
    else if (sizeof(T) == 2) {
        short v = (short)(Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flags));
        return Unsafe.As<short, T>(ref v);
    }
    else if (sizeof(T) == 4) {
        int v = Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flags);
        return Unsafe.As<int, T>(ref v);
    }
    else { // size == 8
        long v = Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flags);
        return Unsafe.As<long, T>(ref v);
    }
}

public static byte SetFlags(byte value, byte flags)
{
    return (byte)(value | flags);
}

First method creates this for byte:

    L0000: mov [rsp+0x8], ecx
    L0004: mov [rsp+0x10], edx
    L0008: movzx eax, byte [rsp+0x8]
    L000d: movzx edx, byte [rsp+0x10]
    L0012: or eax, edx
    L0014: movzx eax, al
    L0017: ret

Versus this for the second method:

    L0000: movzx eax, cl
    L0003: movzx edx, dl
    L0006: or eax, edx
    L0008: movzx eax, al
    L000b: ret

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

Slighty related - this generates optimal ASM for the int and long case but does not for the byte and short case for some reason, reverting to similar output as before:

Is looks like you're still using SizeOf instead of sizeof in some cases.

@mikernet
Copy link
Contributor Author

@mikedn Fixed the code sample, just pasted in some stuff that I was playing with and forgot to change it back.

@mikernet
Copy link
Contributor Author

mikernet commented Nov 22, 2019

Fixed the asm output as well, had the wrong runtime selected.

@mikedn
Copy link
Contributor

mikedn commented Nov 22, 2019

Ah, of course. Small integer types are more likely to cause issues because local variables and method of parameters of such types tend to be treated as int internally. So even if it's not obvious from the C# code, you again end up with something like *(byte*)&value that prevents variable value from being enregistered.

So it looks like even with better dead code elimination it may still be useful to handle better this kind of reinterpretation. Fortunately this is relatively easy to fix, I already have some code that handles float/int and long/double reinterpretation and extending shouldn't be difficult.

@Tornhoof
Copy link
Contributor

Tornhoof commented Nov 22, 2019

You should just write else if (Unsafe.SizeOf() == 4), though sadly that's not currently enough for the JIT to eliminate the branch.

Actually, it seems that you could just use sizeof:

public static bool Test<T>(T value, T flag) where T : unmanaged, Enum
{
    if (sizeof(T) == 1)
        return (Unsafe.As<T, byte>(ref value) | Unsafe.As<T, byte>(ref flag)) == Unsafe.As<T, byte>(ref value);
    else if (sizeof(T) == 2)
        return (Unsafe.As<T, short>(ref value) | Unsafe.As<T, short>(ref flag)) == Unsafe.As<T, short>(ref value);
    else if (sizeof(T) == 4)
        return (Unsafe.As<T, int>(ref value) | Unsafe.As<T, int>(ref flag)) == Unsafe.As<T, int>(ref value);
    else // size == 8
        return (Unsafe.As<T, long>(ref value) | Unsafe.As<T, long>(ref flag)) == Unsafe.As<T, long>(ref value);
}

generates the expected code:

G_M37086_IG02:
       0BD1                 or       edx, ecx
       3BD1                 cmp      edx, ecx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al

Thanks, This looks nice :) and way shorter than the code I use in SpanJson for the same problem, I need to track the underlying type of the enum for the code below to work.
Sharplab

           public static bool HasFlag<TEnum, TEnumBase>(TEnum a, TEnum b) where TEnum: struct, Enum where TEnumBase : struct
            {
                if (typeof(TEnumBase) == typeof(sbyte))
                {
                    var aConv = Unsafe.As<TEnum, sbyte>(ref a);
                    var bConv = Unsafe.As<TEnum, sbyte>(ref b);
                    return (aConv & bConv) == bConv;
                }
              // this repeats for every possible base type.

Fwiw, Why are all the HasFlag/Test example source codes above written with OR instead of AND?

@mikernet
Copy link
Contributor Author

@Tornhoof I'm not sure what you mean. (A | B) == A is equivalent to (A & B) == B so it doesn't really matter which way you write it.

@Tornhoof
Copy link
Contributor

I understand it's equivalent, I just haven't seen the OR version used before. That's why I asked.

@mikernet
Copy link
Contributor Author

I don't know, that's just what my brain happened to produce, haha :)

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@jakobbotsch
Copy link
Member

Using Unsafe.SizeOf<T>() in every branch gets the ideal codegen today, so I'm going to close this.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage optimization
Projects
None yet
Development

No branches or pull requests

7 participants