Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we improve ROS<byte>.ToArray from RVA static? #68103

Closed
stephentoub opened this issue Apr 16, 2022 · 11 comments
Closed

Can we improve ROS<byte>.ToArray from RVA static? #68103

stephentoub opened this issue Apr 16, 2022 · 11 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Apr 16, 2022

When doing new byte[] { const, const, const, ... }, the JIT generates code that calls the new array helper and then emits an unrolled loop to copy the data into the array. When doing ((ReadOnlySpan<byte>)new byte[] { const, const, const, ... }).ToArray(), the copy is handled via ToArray's call to Buffer.MemMove. Is it feasible to do better?

using System;
public class C
{
    public byte[] Implicit() => new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
    public byte[] Explicit() => ((ReadOnlySpan<byte>)new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }).ToArray();
}

SharpLab

; Core CLR 6.0.222.6406 on amd64

C.Implicit()
    L0000: sub rsp, 0x28
    L0004: vzeroupper
    L0007: mov rcx, 0x7ffa6ca70a90
    L0011: mov edx, 0x10
    L0016: call 0x00007ffacc4fb1b0
    L001b: mov rdx, 0x221bb0a0894
    L0025: vmovupd xmm0, [rdx]
    L0029: vmovupd [rax+0x10], xmm0
    L002e: add rsp, 0x28
    L0032: ret

C.Explicit()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov edx, 0x10
    L000a: mov rcx, 0x7ffa6ca70a90
    L0014: call 0x00007ffacc4fb1b0
    L0019: mov rsi, rax
    L001c: lea rcx, [rsi+0x10]
    L0020: mov rdx, 0x221bb0a0894
    L002a: mov r8d, 0x10
    L0030: call 0x00007ffa6c97e808
    L0035: mov rax, rsi
    L0038: add rsp, 0x20
    L003c: pop rsi
    L003d: ret

category:cq
theme:optimization
skill-level:beginner
cost:small
impact:small

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 16, 2022
@ghost
Copy link

ghost commented Apr 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When doing new byte[] { const, const, const, ... }, the JIT generates code that calls the new array helper and then emits an unrolled loop to copy the data into the array. When doing ((ReadOnlySpan<byte>)new byte[] { const, const, const, ... }).ToArray(), the copy is handled via ToArray's call to Buffer.MemMove. Is it feasible to do better?

using System;
public class C
{
    public byte[] Implicit() => new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
    public byte[] Explicit() => ((ReadOnlySpan<byte>)new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }).ToArray();
}

SharpLab

; Core CLR 6.0.222.6406 on amd64

C.Implicit()
    L0000: sub rsp, 0x28
    L0004: vzeroupper
    L0007: mov rcx, 0x7ffa6ca70a90
    L0011: mov edx, 0x10
    L0016: call 0x00007ffacc4fb1b0
    L001b: mov rdx, 0x221bb0a0894
    L0025: vmovupd xmm0, [rdx]
    L0029: vmovupd [rax+0x10], xmm0
    L002e: add rsp, 0x28
    L0032: ret

C.Explicit()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov edx, 0x10
    L000a: mov rcx, 0x7ffa6ca70a90
    L0014: call 0x00007ffacc4fb1b0
    L0019: mov rsi, rax
    L001c: lea rcx, [rsi+0x10]
    L0020: mov rdx, 0x221bb0a0894
    L002a: mov r8d, 0x10
    L0030: call 0x00007ffa6c97e808
    L0035: mov rax, rsi
    L0038: add rsp, 0x20
    L003c: pop rsi
    L003d: ret
Author: stephentoub
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Apr 16, 2022

If understand this correctly you mean that:
If JIT sees that a span was constructed from an array (full length) it can replace Span.ToArray() to just return the initial array it was created from? JIT optimizations for spans are very hard to do - they're quite verbose in the IR 😞 E.g. for Explicit():

Trees after Morph - Global
*************** Finishing PHASE Morph - Global
Trees after Morph - Global

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..015)-> BB09 (always)                     i 
BB08 [0010]  0                             1       [00D..00E)-> BB18 (always)                     i hascall 
BB09 [0011]  1       BB01                  1       [00D..00E)                                     i hascall gcsafe new[] nullcheck 
BB18 [0012]  2       BB08,BB09             1       [???..???)        (return)                     internal 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..015) -> BB09 (always), preds={} succs={BB09}

***** BB01
STMT00007 ( INL01 @ 0x01F[E-] ... ??? ) <- INLRT @ 0x000[E-]
               [000030] -A---+------                        *  ASG       byref 
               [000029] D----+-N----                        +--*  LCL_VAR   byref  V19 tmp17        
               [000049] H----+------                        \--*  CNS_INT(h) byref  0xd1ffab1e static Fseq[5DFBABEEDF318BF33C0927C43D7630F51B82F351740301354FA3D7FC51F0132E]

***** BB01
STMT00008 ( INL01 @ 0x02B[--] ... ??? ) <- INLRT @ 0x000[E-]
               [000173] -A---+------                        *  ASG       byref 
               [000171] D------N----                        +--*  LCL_VAR   byref  V17 tmp15        
               [000172] ------------                        \--*  LCL_VAR   byref  V19 tmp17        

***** BB01
STMT00009 ( INL01 @ 0x030[E-] ... ??? ) <- INLRT @ 0x000[E-]
               [000040] -A---+------                        *  ASG       int   
               [000039] D----+-N----                        +--*  LCL_VAR   int    V18 tmp16        
               [000038] -----+------                        \--*  CNS_INT   int    16

***** BB01
STMT00001 ( 0x00C[--] ... ??? )
               [000180] -A---+------                        *  COMMA     void  
               [000176] -A----------                        +--*  ASG       byref 
               [000174] D------N----                        |  +--*  LCL_VAR   byref  V15 tmp13        
               [000175] ------------                        |  \--*  LCL_VAR   byref  V17 tmp15        
               [000179] -A----------                        \--*  ASG       int   
               [000177] D------N----                           +--*  LCL_VAR   int    V16 tmp14        
               [000178] ------------                           \--*  LCL_VAR   int    V18 tmp16        

------------ BB08 [00D..00E) -> BB18 (always), preds={} succs={BB18}

***** BB08
STMT00017 ( INL04 @ 0x008[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000085] -ACXG+------                        *  ASG       ref   
               [000084] D----+-N----                        +--*  LCL_VAR   ref    V05 tmp3         
               [000093] --CXG+------                        \--*  COMMA     ref   
               [000092] H-CXG+------                           +--*  CALL help long   HELPER.CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
               [000090] -----+------ arg0 in rcx               |  +--*  CNS_INT   long   0x7ff977404b30
               [000091] -----+------ arg1 in rdx               |  \--*  CNS_INT   int    11
               [000088] n---G+------                           \--*  IND       ref   
               [000183] I----+------                              \--*  CNS_INT(h) long   0xd1ffab1e static Fseq[Value]

------------ BB09 [00D..00E), preds={BB01} succs={BB18}

***** BB09
STMT00012 ( INL04 @ 0x00E[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000064] -ACXG+------                        *  ASG       ref   
               [000063] D----+-N----                        +--*  LCL_VAR   ref    V06 tmp4         
               [000062] --CXG+------                        \--*  CALL help ref    HELPER.CORINFO_HELP_NEWARR_1_VC
               [000061] -----+------ arg1 in rdx               +--*  CAST      long <- int
               [000059] -----+------                           |  \--*  LCL_VAR   int    V16 tmp14        
               [000060] H----+------ arg0 in rcx               \--*  CNS_INT(h) long   0xd1ffab1e class

***** BB09
STMT00032 ( INL04 @ 0x01A[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000140] -A-XG+------                        *  ASG       byref 
               [000139] D----+-N----                        +--*  LCL_VAR   byref  V07 tmp5         
               [000191] ---XG+-N----                        \--*  COMMA     byref 
               [000187] ---X-+-N----                           +--*  NULLCHECK byte  
               [000186] -----+------                           |  \--*  LCL_VAR   ref    V06 tmp4         
               [000190] -----+------                           \--*  ADD       byref 
               [000188] -----+------                              +--*  LCL_VAR   ref    V06 tmp4         
               [000189] -----+------                              \--*  CNS_INT   long   16 field offset Fseq[Data]

***** BB09
STMT00033 ( INL04 @ 0x01A[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000142] -A--G+------                        *  ASG       byref 
               [000141] D----+-N----                        +--*  LCL_VAR   byref  V08 tmp6         
               [000070] -----+------                        \--*  LCL_VAR   byref  V15 tmp13        

***** BB09
STMT00034 ( INL04 @ 0x01A[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000144] -A---+------                        *  ASG       long  
               [000143] D----+-N----                        +--*  LCL_VAR   long   V10 tmp8         
               [000076] -----+---U--                        \--*  CAST      long <- ulong <- uint
               [000075] -----+------                           \--*  LCL_VAR   int    V16 tmp14        

***** BB09
STMT00031 ( INL07 @ ??? ... ??? ) <- INL04 @ 0x01A[E-] <- INLRT @ 0x00D[E-]
               [000138] --CXG+------                        *  CALL      void   System.Buffer.Memmove
               [000121] -----+------ arg0 in rcx            +--*  LCL_VAR   byref  V07 tmp5         
               [000124] -----+------ arg1 in rdx            +--*  LCL_VAR   byref  V15 tmp13        
               [000130] -----+------ arg2 in r8             \--*  LCL_VAR   long   V10 tmp8         

***** BB09
STMT00015 ( INL04 @ 0x037[E-] ... ??? ) <- INLRT @ 0x00D[E-]
               [000080] -A---+------                        *  ASG       ref   
               [000079] D----+-N----                        +--*  LCL_VAR   ref    V05 tmp3         
               [000078] -----+------                        \--*  LCL_VAR   ref    V06 tmp4         

------------ BB18 [???..???) (return), preds={BB08,BB09} succs={}

***** BB18
STMT00003 ( 0x00D[E-] ... ??? )
               [000013] -----+------                        *  RETURN    ref   
               [000086] -----+------                        \--*  LCL_VAR   ref    V05 tmp3         

@stephentoub
Copy link
Member Author

stephentoub commented Apr 16, 2022

if span was constructed from an array

Yes, except its not actually an array, so not that it would return the original object but rather directly copy the rva static data as happens in the RuntimeHelpers.InitializeArray Implicit case.

I'm asking because if the natural type of the new u8 literal in C# becomes ReadOnlySpan<byte>, Explicit is the pattern you'll get for e.g. "abcd"u8.ToArray().

@EgorBo
Copy link
Member

EgorBo commented Apr 16, 2022

if span was constructed from an array

Yes, except its not actually an array, so not that it would return the original object but rather directly copy the rva static data as happens in the RuntimeHelpers.InitializeArray Implicit case.

I'm asking because if the natural type of the new u8 literal in C# becomes ReadOnlySpan<byte>, Explicit is the pattern you'll get for e.g. "abcd"u8.ToArray().

I see. Honestly speaking I'd rather optimize this pattern in Roslyn or ILLink substep 🙁 I don't see an easy way to do it in JIT, perhaps someone from @dotnet/jit-contrib does?

@stephentoub
Copy link
Member Author

I'd rather optimize this pattern in Roslyn

That's fair.

@am11
Copy link
Member

am11 commented Apr 16, 2022

if the natural type of the new u8 literal in C# becomes ReadOnlySpan<byte>

Roslyn previewed this feature with ReadOnlySpan<byte>. If it is still being finalized, isReadOnlySpan<System.Text.Rune> option in the cards for u8 suffix? 🙂

@stephentoub
Copy link
Member Author

stephentoub commented Apr 16, 2022

Roslyn previewed this feature with ReadOnlySpan<byte>.

In the prototype the natural type is actually byte[].

is ReadOnlySpan<System.Text.Rune> option in the cards for u8 suffix?

Each rune is 32 bits and would negate many of the intended uses of having cheap, direct access to UTF-8 bytes.

Or did you mean as an additional target type rather than instead of?

@am11
Copy link
Member

am11 commented Apr 16, 2022

Ah, u32. Yes there will be a lot of overhead. Way too expensive than multi-byte characters directly encoded to byte array. 😁

I was originally thinking ROS<char> to UTF16 string ("..") could be same as ROS<Rune> to UTF8 string (".."u8), i.e. both being strongly-typed. The compiler could emit new Rune(const).Utf8SequenceLength / new Rune(const).TryEncodeToUtf8() calls in the IL, the runtime could then extract UTF8 bytes (at least for literal strings) in the jitinterface and something similar in mono. (it would also improve codegen of constantRune.Utf8SequenceLength and constantRune.TryEncodeToUtf8() more generally)

@jakobbotsch
Copy link
Member

I see. Honestly speaking I'd rather optimize this pattern in Roslyn or ILLink substep 🙁 I don't see an easy way to do it in JIT, perhaps someone from @dotnet/jit-contrib does?

It does look nice and optimizable after assertion prop:

N007 ( 18, 20) [000141] --CXG-------                          CALL      void   System.Buffer.Memmove $VN.Void
N004 (  1,  1) [000124] ------------ arg0 in rcx            ├──▌  LCL_VAR   byref  V06 tmp5         u:2 (last use) $280
N005 (  2, 10) [000190] H----------- arg1 in rdx            ├──▌  CNS_INT(h) long   0x142d5bf20a0 static $140
N006 (  1,  1) [000191] ------------ arg2 in r8             └──▌  CNS_INT   long   16 $81

But doing a second pass of morph-like optimizations here would be new territory.

@EgorBo
Copy link
Member

EgorBo commented Apr 16, 2022

I see. Honestly speaking I'd rather optimize this pattern in Roslyn or ILLink substep 🙁 I don't see an easy way to do it in JIT, perhaps someone from @dotnet/jit-contrib does?

It does look nice and optimizable after assertion prop:

N007 ( 18, 20) [000141] --CXG-------                          CALL      void   System.Buffer.Memmove $VN.Void
N004 (  1,  1) [000124] ------------ arg0 in rcx            ├──▌  LCL_VAR   byref  V06 tmp5         u:2 (last use) $280
N005 (  2, 10) [000190] H----------- arg1 in rdx            ├──▌  CNS_INT(h) long   0x142d5bf20a0 static $140
N006 (  1,  1) [000191] ------------ arg2 in r8             └──▌  CNS_INT   long   16 $81

But doing a second pass of morph-like optimizations here would be new territory.

Ah, indeed, I didn't think of it this way, nice!

@EgorBo
Copy link
Member

EgorBo commented Mar 26, 2023

closed via #83638

@EgorBo EgorBo closed this as completed Mar 26, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

5 participants