-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BFloat16 #98643
base: main
Are you sure you want to change the base?
Add BFloat16 #98643
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsCloses #96295. Delegating all functional members to float, since the upcast is trivial. The only logic is rounding from double.
|
result = (actualValue >= new BFloat16(0x7F00)) ? MaxValue : | ||
(actualValue <= new BFloat16(0xFF00)) ? MinValue : (Int128)actualValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for Half
we're using == Half.PositiveInfinity
and == Half.NegativeInfinity
for clarity.
We should do the equivalent here for BFloat16
, either using an existing named constant or documenting what the boundary values used are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cases are actually different here. For Half
the max possible value is 65504
, which is in range of larger integer.
The value range of BFloat16
are about the same of float
which is larger than the integers. The value is actually a manual truncate of (float)Int128.MaxValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant conceptually the same. That is, we should use a well named member where it exists -or- document what value the bits represent.
0x7F00
says nothing to the typical reader. Adding a comment that explains it means (float)Int128.MaxValue
and corresponds roughly to 340282366920938463463374607431768211456
(adjusting for precision and the actual number stored) then means a lot
{ | ||
BFloat16 actualValue = (BFloat16)(object)value; | ||
result = (actualValue >= new BFloat16(0x4700)) ? MaxValue : | ||
(actualValue <= new BFloat16(0xB700)) ? MinValue : (short)actualValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have and use an internal UInt16BitsToBFloat16
for parity with the other floating-point support.
@@ -1015,7 +1015,7 @@ public static explicit operator double(Half value) | |||
exp -= 1; | |||
} | |||
|
|||
return CreateDouble(sign, (ushort)(exp + 0x3F0), (ulong)sig << 42); | |||
return Math.CreateDouble(sign, (ushort)(exp + 0x3F0), (ulong)sig << 42); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Math
is basically "frozen" and not a place we want to find new methods, even internal ones
These would be "better" as internal double.Create
and float.Create
methods instead
Closes #96295.
Delegating all functional members to float, since the upcast is trivial. The only logic is rounding from double.
Tests are borrowed from Half.