Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullable<T> interface check / dispatch is comparatively very slow #50915

Open
stephentoub opened this issue Apr 8, 2021 · 15 comments · Fixed by #50997
Open

Nullable<T> interface check / dispatch is comparatively very slow #50915

stephentoub opened this issue Apr 8, 2021 · 15 comments · Fixed by #50997
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Apr 8, 2021

Repro:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;

[MemoryDiagnoser]
[DisassemblyDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    private S _s = default(S);
    private C _c = new C();
    private S? _ns = (S?)default(S);

    [Benchmark] public int Struct() => CallM(_s);
    [Benchmark] public int Class() => CallM(_c);
    [Benchmark] public int Nullable() => CallM(_ns);
    [Benchmark] public int NullableSpecialized() => CallMSpecial(_ns);

    static int CallM<T>(T t)
    {
        if (t is IMyInterface)
        {
            return ((IMyInterface)t).M();
        }

        return 0;
    }

    static int CallMSpecial<T>(T? t) where T : struct
    {
        if (t.HasValue)
        {
            return CallM(t.GetValueOrDefault());
        }

        return 0;
    }
}

interface IMyInterface
{
    int M();
}

struct S : IMyInterface
{
    public int M() => 42;
}

class C : IMyInterface
{
    public int M() => 42;
}
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated Code Size
Struct 0.4807 ns 0.0044 ns 0.0037 ns - - - - 16 B
Class 3.0842 ns 0.0045 ns 0.0038 ns - - - - 86 B
Nullable 88.6692 ns 0.5664 ns 0.4729 ns 0.0076 - - 48 B 132 B
NullableSpecialized 1.5149 ns 0.0087 ns 0.0077 ns - - - - 64 B

Is there anything we can do to make the nullable case faster and less inclined to box without needing a special code path to handle it (e.g. my CallMSpecial, which in my current actual code is another overload so that a nullable binds to it more tightly).

Here's what the codegen for the nullable case looks like:

; Program.CallM[[System.Nullable`1[[S, Benchmarks]], System.Private.CoreLib]](System.Nullable`1<S>)
       sub       rsp,28
       mov       [rsp+30],rcx
       lea       rdx,[rsp+30]
       mov       rcx,offset MT_System.Nullable`1[[S, Benchmarks]]
       call      CORINFO_HELP_BOX_NULLABLE
       mov       rdx,rax
       mov       rcx,offset MT_IMyInterface
       call      CORINFO_HELP_ISINSTANCEOFINTERFACE
       test      rax,rax
       je        short M01_L00
       lea       rdx,[rsp+30]
       mov       rcx,offset MT_System.Nullable`1[[S, Benchmarks]]
       call      CORINFO_HELP_BOX_NULLABLE
       mov       rdx,rax
       mov       rcx,offset MT_IMyInterface
       call      CORINFO_HELP_CHKCASTINTERFACE
       mov       rcx,rax
       mov       r11,7FFCDCEE05D0
       call      qword ptr [7FFCDD2705D0]
       nop
       add       rsp,28
       ret
M01_L00:
       xor       eax,eax
       add       rsp,28
       ret
; Total bytes of code 122

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
impact:small

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 8, 2021
@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib.

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2021

I can take a look, so basically some of our optimizations are just not Nullable<> friendly, e.g. impBoxPatternMatch
We need to handle IL sequences like this:

box [Nullable<S>]
isinst [IMyInterface]
brtrue/brfalse

to be optimized to:

call get_HasValue()   // ldfld
brtrue/brfalse

(same for box+isinst+ldnull+cgt.un).

Another issue here is that we might want to optimize boxing for Nullables from this:

return CORINFO_HELP_BOX_NULLABLE(o)

to something like this:

if (!o.HasValue)
    return null;
else 
    return CORINFO_HELP_BOX_NULLABLE(o)

@EgorBo EgorBo self-assigned this Apr 9, 2021
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Apr 9, 2021
@EgorBo EgorBo added this to the 6.0.0 milestone Apr 9, 2021
@Joe4evr
Copy link
Contributor

Joe4evr commented Apr 9, 2021

Another issue here is that we might want to optimize boxing for Nullables from this:
[...]
to something like this:

Would that really make a difference? IIRC, boxing a no-value Nullable<T> already makes the reference point to actual null.

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2021

Another issue here is that we might want to optimize boxing for Nullables from this:
[...]
to something like this:

Would that really make a difference? IIRC, boxing a no-value Nullable<T> already makes the reference point to actual null.

It would allow us to avoid "call" overhead (by slightly regressing size and additional check for non-null case - so I agree that one is questionable and requires some analysis or how frequent we pass null there in a general case) branch: EgorBo@a0d8ee9

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 9, 2021
@stephentoub
Copy link
Member Author

Thanks for looking at this, @EgorBo. Knowing whether these can and will (eventually) be improved will impact what overloads we choose to expose in #50635. This is a significant enough difference that if we don't think this can be improved meaningfully, we'll need to expose the T? overloads there. If we can make it work better, we can leave those out.

What's particularly interesting to me is how much worse the nullable case is than just boxing it from the start:

Method Mean Error StdDev Code Size Gen 0 Gen 1 Gen 2 Allocated
Struct 0.5162 ns 0.0226 ns 0.0211 ns 16 B - - - -
Class 3.3559 ns 0.0216 ns 0.0192 ns 86 B - - - -
Nullable 87.4553 ns 0.3305 ns 0.3091 ns 132 B 0.0076 - - 48 B
NullableBoxed 47.8314 ns 0.2854 ns 0.2670 ns 114 B 0.0038 - - 24 B
NullableSpecialized 1.7265 ns 0.0309 ns 0.0289 ns 64 B - - - -
    [Benchmark]
    public int NullableBoxed() => CallM((object)_ns);

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2021

What's particularly interesting to me is how much worse the nullable case is than just boxing it from the start:

Yeah that's funny, you box it once while original Nullable() benchmark actually boxes it twice (in isinst and then in cast) 🙂

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2021

Knowing whether these can and will (eventually) be improved will impact what overloads we choose to expose in #50635

I think this specific case definitely can be improved as part of 6.0, the first part is already ready: #50997
the second one (to basically handle this: sharplab.io) also looks doable.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 27, 2021
@stephentoub
Copy link
Member Author

@EgorBo, you closed this, but that one PR didn't fully address this, did it?

@stephentoub stephentoub reopened this Apr 28, 2021
@benaadams
Copy link
Member

Github's autolink doesn't notice the "Partially" qualifier before "fixes" :)

image

I normally go for "contributes to" in this situation to avoid the auto close

@EgorBo
Copy link
Member

EgorBo commented Apr 28, 2021

Oops, indeed, will use "contribute" next time 🙂

@EgorBo
Copy link
Member

EgorBo commented Jul 12, 2021

Moving to 7.0.0 for the second part where we need to optimize

CALLV MyMethod()
  \__ CALL COREINFO_NULLABLE_BOX_HELPER

It'd certainly doable but unlikely to make it into 6.0 😞

@EgorBo EgorBo modified the milestones: 6.0.0, 7.0.0 Jul 12, 2021
@EgorBo
Copy link
Member

EgorBo commented Jul 17, 2023

Ugh, forgot about this issue this milestone, I see locally improvements with the following patch:

        case CORINFO_HELP_BOX_NULLABLE:
        {
            GenTree*             typeArg     = call->gtArgs.GetUserArgByIndex(0)->GetNode();
            CORINFO_CLASS_HANDLE nullableCls = gtGetHelperArgClassHandle(typeArg);
            if (nullableCls != NO_CLASS_HANDLE)
            {
                assert(info.compCompHnd->getBoxHelper(nullableCls) == CorInfoHelpFunc::CORINFO_HELP_BOX_NULLABLE);
                objClass = info.compCompHnd->getTypeForBox(nullableCls);
                *pIsNonNull = false;
                *pIsExact = false;
            }
        }
        break;

(in gtGetHelperCallClassHandle) but needs more work.

@EgorBo
Copy link
Member

EgorBo commented May 2, 2024

Made an improvement for this issue this release (#95764) but it's still not fixed, moved to next

| Method              | Mean       | Error     | StdDev    | Gen0   | Code Size | Allocated |
|-------------------- |-----------:|----------:|----------:|-------:|----------:|----------:|
| Struct              |  0.1875 ns | 0.0043 ns | 0.0040 ns |      - |      16 B |         - |
| Class               |  0.3886 ns | 0.0041 ns | 0.0038 ns |      - |     134 B |         - |
| Nullable            | 24.3416 ns | 0.2646 ns | 0.2475 ns | 0.0014 |      67 B |      24 B |
| NullableSpecialized |  0.7473 ns | 0.0040 ns | 0.0031 ns |      - |      41 B |         - |

@EgorBo EgorBo modified the milestones: 9.0.0, Future May 2, 2024
@JulieLeeMSFT JulieLeeMSFT moved this to Optimizations in .NET Core CodeGen Jun 5, 2024
@EgorBot

This comment was marked as resolved.

@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2024

@EgorBot -intel -arm64 --runtimes net6.0 net7.0 net8.0 net9.0

using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
[DisassemblyDiagnoser]
public class Bench
{
    private S _s = default(S);
    private C _c = new C();
    private S? _ns = (S?)default(S);

    [Benchmark] public int Struct() => CallM(_s);
    [Benchmark] public int Class() => CallM(_c);
    [Benchmark] public int Nullable() => CallM(_ns);
    [Benchmark] public int NullableSpecialized() => CallMSpecial(_ns);

    static int CallM<T>(T t)
    {
        if (t is IMyInterface)
        {
            return ((IMyInterface)t).M();
        }

        return 0;
    }

    static int CallMSpecial<T>(T? t) where T : struct
    {
        if (t.HasValue)
        {
            return CallM(t.GetValueOrDefault());
        }

        return 0;
    }
}

interface IMyInterface
{
    int M();
}

struct S : IMyInterface
{
    public int M() => 42;
}

class C : IMyInterface
{
    public int M() => 42;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Status: Optimizations
Development

Successfully merging a pull request may close this issue.

6 participants