Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Interlocked.CompareExchange for byte/short/sbyte/ushort #64658

Closed
333fred opened this issue Feb 2, 2022 · 40 comments · Fixed by #92974
Closed

[API Proposal]: Interlocked.CompareExchange for byte/short/sbyte/ushort #64658

333fred opened this issue Feb 2, 2022 · 40 comments · Fixed by #92974
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Milestone

Comments

@333fred
Copy link
Member

333fred commented Feb 2, 2022

Background and motivation

We have a few flags fields in the compiler that can be set by multiple threads, used as part of our symbol models. As we avoid locks whenever possible, we generally use a loop on Interlocked.CompareExchange, under the assumption it will succeed eventually or the value will have been computed and set by some other thread. However, because CompareExchange only supports int, long, uint, and ulong integer values, these fields must be at least 1 int wide. This doesn't sound like much, but as the compiler needs to load many, many symbols, those few extra bits can add up. I'd like to be able to use bytes where possible for these values. I only really want byte/ushort overloads, but added the signed variants for completeness.

API Proposal

namespace System.Threading
{
    public static class Interlocked
    {
        public static byte CompareExchange(ref byte location1, byte value, byte comparand);
        public static sbyte CompareExchange(ref sbyte location1, sbyte value, sbyte comparand);
        public static short CompareExchange(ref short location1, short value, short comparand);
        public static ushort CompareExchange(ref ushort location1, ushort value, ushort comparand);
    }
}

API Usage

        public static bool Set(ref byte flags, byte toSet)
        {
            byte oldState, newState;
            do
            {
                oldState = flags;
                newState = oldState | toSet;
                if (newState == oldState)
                {
                    return false;
                }
            }
            while (Interlocked.CompareExchange(byte flags, newState, oldState) != oldState);
            return true;
        }

Alternative Designs

No response

Risks

No response

@333fred 333fred added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 2, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 2, 2022
@ghost
Copy link

ghost commented Feb 2, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

We have a few flags fields in the compiler that can be set by multiple threads, used as part of our symbol models. As we avoid locks whenever possible, we generally use a loop on Interlocked.CompareExchange, under the assumption it will succeed eventually or the value will have been computed and set by some other thread. However, because CompareExchange only supports int, long, uint, and ulong integer values, these fields must be at least 1 int wide. This doesn't sound like much, but as the compiler needs to load many, many symbols, those few extra bits can add up. I'd like to be able to use bytes where possible for these values. I only really want byte/ushort overloads, but added the signed variants for completeness.

API Proposal

namespace System.Threading
{
    public class Interlocked
    {
        public byte CompareExchange(ref byte location, byte value, byte comparand);
        public sbyte CompareExchange(ref sbyte location, sbyte value, sbyte comparand);
        public short CompareExchange(ref short location, short value, short comparand);
        public ushort CompareExchange(ref ushort location, ushort value, ushort comparand);
    }
}

API Usage

        public static bool Set(ref byte flags, byte toSet)
        {
            byte oldState, newState;
            do
            {
                oldState = flags;
                newState = oldState | toSet;
                if (newState == oldState)
                {
                    return false;
                }
            }
            while (Interlocked.CompareExchange(byte flags, newState, oldState) != oldState);
            return true;
        }

Alternative Designs

No response

Risks

No response

Author: 333fred
Assignees: -
Labels:

api-suggestion, area-System.Threading, untriaged

Milestone: -

@tannergooding
Copy link
Member

Also CC. @stephentoub

This is something that both x86/x64 (cmpxchg) and Arm64 (cas, casb, and cash) support so I don't see anything that would block this.

  • I'm still of the general opinion that we should go ahead and just expose the full set of missing Interlocked functionality here to help ensure we have a good and usable baseline for users to write their multi-threaded algorithms. x86/x64 and Arm64 support these for Add, And, BitTestAndComplement, BitTestAndReset, BitTestAndSet, CompareExchange, Decrement, Increment, Negate, Not, Or, Subtract, and XOr. There are also some architecture specific cases that are possible good for Hardware Intrinsic options. Other major languages/runtimes (including Java, not just C/C++, Rust, etc) have a much more complete set of baseline APIs for atomic and interlocked operations than we do and I believe we are "behind" comparatively speaking.

@stephentoub
Copy link
Member

stephentoub commented Feb 2, 2022

I have no problem with us adding CompareExchange overloads for byte/short/etc. if every architecture we target and envision targeting supports them.

Same for overloads of other existing APIs we have, e.g. Increment, Decrement, etc.

I'm still of the general opinion that we should go ahead and just expose the full set of missing Interlocked functionality here to help ensure we have a good and usable baseline for users to write their multi-threaded algorithms. x86/x64 and Arm64 support these for Add, And, BitTestAndComplement, BitTestAndReset, BitTestAndSet, CompareExchange, Decrement, Increment, Negate, Not, Or, Subtract, and XOr. There are also some architecture specific cases that are possible good for Hardware Intrinsic options. Other major languages/runtimes (including Java, not just C/C++, Rust, etc) have a much more complete set of baseline APIs for atomic and interlocked operations than we do and I believe we are "behind" comparatively speaking.

We discussed this in:
#24694 (comment)
I've not seen or heard any further discussion that would change my opinion (namely that we shouldn't be adding APIs for which there isn't a demonstrated need). If you'd like to make a case for it, please open a separate issue.

@aleksandr-aleksashin
Copy link

aleksandr-aleksashin commented Feb 5, 2022

A slight digression from the topic of the issue, but related to it.
I have not found related issues about the below problem.
Apart from the overloads described in this issue, it would be a good idea to add overloads for the Enum type. As far as I know, it is based on integers and it's possible.

Some of the current workarounds are described here or here

@deeprobin
Copy link
Contributor

deeprobin commented Mar 15, 2022

@333fred Can you add a nuint overload to your proposal?

A nint overload is already covered by the existing IntPtr overload). But it might be useful to add CompareExchangeNInt (@tannergooding what do you think?)

@stephentoub
Copy link
Member

Can you add a nuint overload to your proposal?

That's #45103.

But it might be useful to add CompareExchangeNInt

I don't think we should do that.

@333fred
Copy link
Member Author

333fred commented Mar 15, 2022

@stephentoub is there anything blocking this suggestion? Can we tag it for review?

@stephentoub
Copy link
Member

That's up to @kouvel.

@kouvel kouvel added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 16, 2022
@kouvel
Copy link
Member

kouvel commented Mar 16, 2022

I don't see any issues with this either, marked as ready-for-review

@terrajobst
Copy link
Contributor

  • Looks good, but we should probably also do Exchange
namespace System.Threading
{
    public static class Interlocked
    {
        public static byte CompareExchange(ref byte location1, byte value, byte comparand);
        public static sbyte CompareExchange(ref sbyte location1, sbyte value, sbyte comparand);
        public static short CompareExchange(ref short location1, short value, short comparand);
        public static ushort CompareExchange(ref ushort location1, ushort value, ushort comparand);

        public static byte Exchange(ref byte location1, byte value);
        public static sbyte Exchange(ref sbyte location1, sbyte value);
        public static short Exchange(ref short location1, short value);
        public static ushort Exchange(ref ushort location1, ushort value);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 14, 2022
@deeprobin
Copy link
Contributor

Does this API require JIT-changes?

@jkotas
Copy link
Member

jkotas commented Apr 14, 2022

There are multiple possible implementation strategies. It can either be implemented in the runtimes in C/C++ and/or it can be must-expand intrinsic in the JIT.

@timcassell
Copy link

timcassell commented May 1, 2022

Even if an architecture doesn't support these operations, couldn't they be implemented as CompareExchange loops?

public static short CompareExchange(ref short location, short value, short comparand)
{
    ref int locationAsInt = ref Unsafe.As<short, int>(ref location);
    int newValueAsInt = 0;
    ref short newValueLocation = ref Unsafe.As<int, short>(ref newValueAsInt);
    int initialValueAsInt;
    short initialValue;
    do
    {
        initialValueAsInt = Volatile.Read(ref locationAsInt);
        initialValue = Unsafe.As<int, short>(ref initialValueAsInt);
        if (initialValue != comparand)
        {
            return initialValue;
        }
        newValueAsInt = initialValueAsInt;
        Volatile.Write(ref newValueLocation, value);
    } while (Interlocked.CompareExchange(ref locationAsInt, newValueAsInt, initialValueAsInt) != initialValueAsInt);
    return initialValue;
}

Of course I realize this could be quite slow if the short's location isn't on a 4-byte address. The int location could be shifted to the floor of the nearby 4-byte address (using pointers instead of Unsafe.As), unless the short itself crosses a 4-byte boundary.

[Edit] Although, I would find it strange for a new architecture to support working with small values and not also support working with them in an Interlocked fashion.

@vladd
Copy link

vladd commented May 1, 2022

@timcassell
newValueAsInt = initialValue; should be perhaps newValueAsInt = initialValueAsInt;? And maybe if (initialValue == comparand) instead of if (initialValue != comparand).

@timcassell
Copy link

@timcassell newValueAsInt = initialValue; should be perhaps newValueAsInt = initialValueAsInt;?

Ah yes, good catch.

And maybe if (initialValue == comparand) instead of if (initialValue != comparand).

No, that would change the behavior to compare exchange if not equal instead of compare exchange if equal. Also, that check would be removed to implement Exchange.

@vladd
Copy link

vladd commented May 2, 2022

@timcassell

No, that would change the behavior to compare exchange if not equal instead of compare exchange if equal.

Indeed, you are right. Sorry for the false alarm!

@stephentoub
Copy link
Member

In your code, if the byte/short/etc. is just before memory that's not accessible, such that extending by a few more bytes steps over that boundary, this could AV / seg fault.

I realize this could be quite slow

Which is why we don't want to implement these without the right hardware support. Otherwise someone "optimizing" their code by updating it from, say, using an int field to instead using a byte field is actually deoptimizing it.

@timcassell
Copy link

In your code, if the byte/short/etc. is just before memory that's not accessible, such that extending by a few more bytes steps over that boundary, this could AV / seg fault.

Yes, I realized that, and shifting the pointer like I mentioned would solve that as well. But that's not the point. That was just a proof-of-concept to show that it could be implemented without hardware support, if necessary.

Which is why we don't want to implement these without the right hardware support. Otherwise someone "optimizing" their code by updating it from, say, using an int field to instead using a byte field is actually deoptimizing it.

It seems that all architectures that .Net currently supports have hardware support, so this is a non-issue. And I doubt future architecture wouldn't support it. But, hypothetically, if .Net does support a future architecture that doesn't have hardware support, would you prefer to throw a NotSupportedException, or have a slow implementation?

@stephentoub
Copy link
Member

It seems that all architectures that .Net currently supports have hardware support, so this is a non-issue.

My point was we're not going to add a temporary implementation based on such a loop.

@kouvel kouvel added this to the Future milestone Jul 8, 2022
@timcassell
Copy link

Could byte and char overloads be also considered? They'd be trivial to implement with this.

Perhaps you meant bool? byte is already part of the proposal.

@MichalPetryka
Copy link
Contributor

Could byte and char overloads be also considered? They'd be trivial to implement with this.

Perhaps you meant bool? byte is already part of the proposal.

Right, my bad.

@deeprobin
Copy link
Contributor

deeprobin commented Sep 3, 2022

Could bool and char overloads be also considered? They'd be trivial to implement with this.
In my opinion, a good idea.

What do you think? @terrajobst @stephentoub

@stephentoub
Copy link
Member

stephentoub commented Sep 3, 2022

What do you think?

It's trivial for someone to implement those themselves in terms of the proposed ones. That said, bool could be a reasonable addition, just because it's so common. The problem is it's a slippery slope with all the different methods and all the different possible types; line needs to be drawn somewhere.

Why are you asking @deeprobin?

@deeprobin
Copy link
Contributor

@stephentoub Oh I thought something like that always has to be strictly re-reviewed.
I ask because I plan to implement this API if no one objects.

@stephentoub
Copy link
Member

Oh I thought something like that always has to be strictly re-reviewed.

It does if it were to be included.

I ask because I plan to implement this API if no one objects.

I'm concerned this may be too complex. What is your plan?

@deeprobin
Copy link
Contributor

I'm concerned this may be too complex. What is your plan?

You are right. I took a look at the whole thing. I would probably have to include instructions like CMPXCHG8B & CMPXCHG16B, which I would still consider feasible, but the ARM implementation would worry me because I haven't dealt with it yet.

Then the issue is better for someone else :D

@TonyValenti
Copy link

I could really use an Interlocked.CompareExchange bool variant. Can that get added too?

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Jan 5, 2023

I could really use an Interlocked.CompareExchange bool variant. Can that get added too?

The only issue with that is that a bool with internal representation of 2 would be considered different from one represented as 1.

@timcassell
Copy link

timcassell commented Jan 5, 2023

I could really use an Interlocked.CompareExchange bool variant. Can that get added too?

The only issue with that is that a bool with internal representation of 2 would be considered different from one represented as 1.

I don't think that's an issue. The bool can just be treated as byte, the same way uint is treated as int. The internal representation will be preserved.

[Edit] Also, you were the one who suggested adding bool yourself earlier. 😆

@tannergooding
Copy link
Member

I don't think that's an issue.

The issue is that it introduces subtle and "unexpected" behavior that many users have to deal with. This makes it a very hard to track down bug when the issue does crop up.

@tannergooding
Copy link
Member

Once byte does exist, then sufficiently empowered users can just use Unsafe.As<bool, byte> to make the "right thing" happen. With the added bonus that it will make the "unsafeness" of the API more visible.

@timcassell
Copy link

How is that any different than bool as it exists today? bool can have a "surprising" internal representation already, interlocked has nothing to do with it.

@stephentoub
Copy link
Member

stephentoub commented Jan 5, 2023

I don't particularly care whether we add a bool-based overload, but how is this concern different from normal comparisons? e.g.

using System.Runtime.CompilerServices;

byte one = 1;
byte two = 2;

bool b1 = Unsafe.As<byte, bool>(ref one);
bool b2 = Unsafe.As<byte, bool>(ref two);

Console.WriteLine(b1);
Console.WriteLine(b2);
Console.WriteLine(b1 == b2);

prints true, true, and false.

@tannergooding
Copy link
Member

In the case of interlocked, it can lead to deadlocks and other subtle problems with complex multithreaded code that make it infinitely harder to diagnose.

Having it be encountered as an edge case in scenarios like == is already painful enough ;)

@MichalPetryka
Copy link
Contributor

[Edit] Also, you were the one who suggested adding bool yourself earlier. 😆

Yeah I've realized this afterwards.

@stephentoub
Copy link
Member

I don't personally see Interlocked adding to that complexity here. You can easily deadlock without it. And it's the introduction of non-canonical Boolean values that's the unsafeness, not a compare, whether atomic with an exchange or not.

@mangod9 mangod9 modified the milestones: 8.0.0, 9.0.0 Jul 20, 2023
@Klotzi111
Copy link

What is the current status of this?
I could really use the proposed API.

I considered implementing my own implementation in C++ and do an external call before I found this issue.
But the native function call overhead of my custom implementation is what scares me a bit (performance overhead).

@terrajobst
Copy link
Contributor

It's scheduled for .NET 9 (which will ship next year). It won't be in .NET 8.

@mangod9 @kouvel should this be labelled with help wanted?

MichalPetryka added a commit to MichalPetryka/runtime that referenced this issue Oct 3, 2023
Implements Interlocked.CompareExchange/Exchange for
byte/sbyte/short/ushort.

Uses an intrinsic on CoreCLR x64 and ARM64 and fallbacks
for other platforms.

Fixes dotnet#64658.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 3, 2023
jkotas added a commit that referenced this issue Jan 25, 2024
* Implement Interlocked for small types.

Implements Interlocked.CompareExchange/Exchange for
byte/sbyte/short/ushort.

Uses an intrinsic on CoreCLR x64 and ARM64 and fallbacks
for other platforms.

Fixes #64658.

* Fix xarch assert

* Update PalRedhawkInline.h

* Update PalRedhawkInline.h

* Update threads.c

* Simplify NAOT handling

* Fix build

* Update atomic.h

* Update atomic.h

* Update atomic.h

* Fix windows intrinsics

* Update comutilnative.cpp

* Fix Windows builds

* Update icall-def.h

* Fix icall-def.h

* Improve tests

* Update InterlockedTests.cs

* Remove small type normalization

* Try to fix the intrinsics

* Fix ARM64 build

* Try to fix XArch 66 prefix and ARM64

* Fix typo

* Fix assert

* Code cleanup

* Update importercalls.cpp

* Update importercalls.cpp

* Extend small types properly

* Add RISC-V asserts

* c11 atomics interlocked ops for small types

* Simplify Windows atomics

* Restore NativeAOT fallbacks for other platforms

* Add more tests

* Format

* Update Interlocked.cs

* Fix tests and comment

* Add emitOutputCV handling

* Fix extension

* Use a slightly better fix

* More complete fix

* Fix the fix

* Fix more places

* CR feedback

* Fix corelib

* Fix tests

* Add using

* Match NativeAOT #ifs

* Update Interlocked.cs

* Fix test projects

* Fix tests, try linking with libatomic

* Improve build test

* Fix tests

* Fix tests again

* Remove libatomic

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* Cleanup, fix helper extension

* Fix test corelib

---------

Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>
Co-authored-by: Aleksey Kliger <alklig@microsoft.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Projects
None yet
Development

Successfully merging a pull request may close this issue.