Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64/Unix] Use native memcpy/memmove for ARM64 as glibc 2.36 added optimizations #93579

Closed
wants to merge 1 commit into from

Conversation

Spacefish
Copy link
Contributor

@Spacefish Spacefish commented Oct 16, 2023

This commit: bminor/glibc@9f298bf added an optimized memcpy implementation for ARM64 which uses the SVE extensions to glibc 2.36 and newer.

It is now used in some Linux Distributions like Debian 12 (bookworm), Ubuntu 23.04 or the upcoming Ubuntu 24.04 LTS in April 2024.

see #8897

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 16, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 16, 2023
@Spacefish Spacefish changed the title Use native memcpy/memmove for ARM64 as glibc 2.36 is optimized now Use native memcpy/memmove for ARM64 as glibc 2.36 added optimizations Oct 16, 2023
@Spacefish Spacefish changed the title Use native memcpy/memmove for ARM64 as glibc 2.36 added optimizations [Arm64/Unix] Use native memcpy/memmove for ARM64 as glibc 2.36 added optimizations Oct 16, 2023
@EgorBo
Copy link
Member

EgorBo commented Oct 17, 2023

512 bytes threshold looks pretty low, does SVE give any advantage over the current implementation? Currently, we align data to 16 bytes boundary and then do copy using 64 bytes blocks (using two stp SIMD stores) + one 64-byte block for the trailing elements. Considering that there is no hardware that implements SVE with >128 bit vectors under the hood?

Pros:

  • We migth potentially benefit from 256-bit (or wider) SVE stores in future (Although, hopefully, by that time JIT will be able to use SVE too)
  • Same for the upcoming hardware memcpy instructions

Cons:

  • We've seen regressions from the native memcpy compared to current implementation (e.g. on Ubuntu 22.04)

Related: #93214

@MichalPetryka
Copy link
Contributor

It is now used in some Linux Distributions like Debian 12 (bookworm), Ubuntu 23.04 or the upcoming Ubuntu 24.04 LTS in April 2024.

Is relying on something present only on such new distros a good idea?

@Spacefish
Copy link
Contributor Author

Spacefish commented Oct 17, 2023

Currently, we align data to 16 bytes boundary and then do copy using 64 bytes blocks (using two stp SIMD stores) + one 64-byte block for the trailing elements.

Oh i didn´t know that. Then my MR probably doesn´t make much sense, as it would lead to performance regressions on most distros out there currently.

@EgorBo EgorBo added area-System.Buffers and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 17, 2023
@ghost
Copy link

ghost commented Oct 17, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

This commit: bminor/glibc@9f298bf added an optimized memcpy implementation for ARM64 which uses the SVE extensions to glibc 2.36 and newer.

It is now used in some Linux Distributions like Debian 12 (bookworm), Ubuntu 23.04 or the upcoming Ubuntu 24.04 LTS in April 2024.

see #8897

Author: Spacefish
Assignees: -
Labels:

area-System.Buffers, community-contribution

Milestone: -

@Spacefish Spacefish closed this Oct 17, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Buffers community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants