-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLANG] Full support of complex multiplication and division. #81514
Changes from 12 commits
13fd739
eb9a35c
d3c4f78
4aa0925
fcd5665
2ddba9a
e62c462
f635f94
1d61aa6
148b6ce
5aa2711
ba9a8da
9098908
52181c7
a9449de
e20741e
0d97b9b
bc3fa4f
ec6296f
3117dbd
0a08598
a558d31
dec045b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1847,19 +1847,51 @@ floating point semantic models: precise (the default), strict, and fast. | |||||
* ``16`` - Forces ``_Float16`` operations to be emitted without using excess | ||||||
precision arithmetic. | ||||||
|
||||||
.. option:: -fcomplex-arithmetic=<value>: | ||||||
|
||||||
This option specifies the implementation for complex multiplication and division. | ||||||
|
||||||
Valid values are: ``basic``, ``improved``, ``full`` and ``promoted``. | ||||||
|
||||||
* ``basic`` Implementation of complex division and multiplication using | ||||||
algebraic formulas at source precision. No special handling to avoid | ||||||
overflow. NaN and infinite and values are not handled. | ||||||
* ``improved`` Implementation of complex division using the Smith algorithm | ||||||
at source precision. Smith's algorithm for complex division. | ||||||
See SMITH, R. L. Algorithm 116: Complex division. Commun. ACM 5, 8 (1962). | ||||||
This value offers improved handling for overflow in intermediate | ||||||
calculations, but overflow may occur. NaN and infinite and values are not | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
handled in some cases. | ||||||
* ``full`` Implementation of complex division and multiplication using a | ||||||
call to runtime library functions (generally the case, but the BE might | ||||||
sometimes replace the library call if it knows enough about the potential | ||||||
range of the inputs). Overflow and non-finite values are handled by the | ||||||
library implementation. For the case of multiplication overflow will occur in | ||||||
accordance with normal floating-point rules. | ||||||
* ``promoted`` Implementation of complex division using algebraic formulas at | ||||||
higher precision. Overflow is handled. Non-finite values are handled in some | ||||||
cases. If the target does not have native support for a higher precision | ||||||
data type, an implementation for the complex operation will be used to provide | ||||||
improved guards against intermediate overflow, but overflow and underflow may | ||||||
still occur in some cases. NaN and infinite and values are not handled. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
This is the default value. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think for many users this would make sense as the default value, but "full" is required for conformance to the C standard. Can we use this as the default if we're targeting pre-C99 but use "full" with C99 and later? I'm not sure what C++ expects, but probably "full" there too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would perhaps be surprising for C89 to use "promoted" and C99 and later to use "full", I think they should probably all use "full" consistently (same with C++, otherwise you get subtle differences with code that lives in header files depending on whether the header is consumed in C or C++ mode). CC @jcranmer-intel for other opinions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. C89 doesn't have complex types, do we even support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we support it as a conforming extension: https://godbolt.org/z/PvdhrTeYx There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that gcc uses a runtime library call with both c89 and c99. That seems like a reasonable way to go. |
||||||
|
||||||
.. option:: -fcx-limited-range: | ||||||
|
||||||
This option enables the naive mathematical formulas for complex division and | ||||||
multiplication with no NaN checking of results. The default is | ||||||
``-fno-cx-limited-range``, but this option is enabled by the ``-ffast-math`` | ||||||
This option is aliased to ``-fcomplex-arithmetic=basic``. It enables the | ||||||
naive mathematical formulas for complex division and multiplication with no | ||||||
NaN checking of results. The default is ``-fno-cx-limited-range`` aliased to | ||||||
``-fcomplex-arithmetic=full``. This option is enabled by the ``-ffast-math`` | ||||||
AaronBallman marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
option. | ||||||
|
||||||
.. option:: -fcx-fortran-rules: | ||||||
|
||||||
This option enables the naive mathematical formulas for complex | ||||||
multiplication and enables application of Smith's algorithm for complex | ||||||
division. See SMITH, R. L. Algorithm 116: Complex division. Commun. | ||||||
ACM 5, 8 (1962). The default is ``-fno-cx-fortran-rules``. | ||||||
This option is aliased to ``-fcomplex-arithmetic=improved``. It enables the | ||||||
naive mathematical formulas for complex multiplication and enables application | ||||||
of Smith's algorithm for complex division. See SMITH, R. L. Algorithm 116: | ||||||
Complex division. Commun. ACM 5, 8 (1962). | ||||||
The default is ``-fno-cx-fortran-rules`` aliased to | ||||||
``-fcomplex-arithmetic=full``. | ||||||
|
||||||
.. _floating-point-environment: | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -396,7 +396,38 @@ class LangOptionsBase { | |||||
IncompleteOnly = 3, | ||||||
}; | ||||||
|
||||||
enum ComplexRangeKind { CX_Full, CX_Limited, CX_Fortran, CX_None }; | ||||||
/// Controls the various implementations for complex multiplication and | ||||||
// division. | ||||||
enum ComplexRangeKind { | ||||||
/// Implementation of complex division and multiplication using a call to | ||||||
/// runtime library functions(generally the case, but the BE might | ||||||
/// sometimes replace the library call if it knows enough about the | ||||||
/// potential range of the inputs). Overflow and non -finite values are | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// handled by the library implementation. | ||||||
CX_Full, | ||||||
|
||||||
/// Implementation of complex division offering an improved handling | ||||||
/// for overflow in intermediate calculations with no special handling for | ||||||
/// NaN and infinite and values. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
CX_Improved, | ||||||
|
||||||
/// Implementation of complex division using algebraic formulas at | ||||||
/// higher precision. Overflow is handled. Non-finite values are handled in | ||||||
/// some cases. If the target hardware does not have native support for a | ||||||
/// higher precision data type, an implementation for the complex operation | ||||||
/// will be used to provide improved guards against intermediate overflow, | ||||||
/// but overflow and underflow may still occur in some cases. NaN and | ||||||
/// infinite and values are not handled. This is the default value. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Be sure to update this comment if switching the default to "full". |
||||||
CX_Promoted, | ||||||
|
||||||
/// Implementation of complex division and multiplication using | ||||||
/// algebraic formulas at source precision. No special handling to avoid | ||||||
/// overflow.NaN and infinite and values are not handled. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
CX_Basic, | ||||||
|
||||||
/// No range rule is enabled. | ||||||
CX_None | ||||||
}; | ||||||
|
||||||
// Define simple language options (with no accessors). | ||||||
#define LANGOPT(Name, Bits, Default, Description) unsigned Name : Bits; | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1039,30 +1039,35 @@ defm offload_uniform_block : BoolFOption<"offload-uniform-block", | |
NegFlag<SetFalse, [], [ClangOption, CC1Option], "Don't assume">, | ||
BothFlags<[], [ClangOption], " that kernels are launched with uniform block sizes (default true for CUDA/HIP and false otherwise)">>; | ||
|
||
def fcx_limited_range : Joined<["-"], "fcx-limited-range">, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't realize these had made it into the 18.0 release when I suggested that we could remove them. We would need at least one release where they are marked as deprecated, but since they are standard gcc options, maybe it makes sense to just keep them and have them alias to the new option as: -fcx-limited-range --> -fcomplex-arithmetic=basic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with aliasing is that the user would be allowed to write something like this: This warning is a bit mis-leading and doesn't reflect the option used in the command line. Not sure this can be corrected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry. I meant "aliasing" in the non-technical sense of "having the same meaning." How that gets implemented is another matter. I think the driver could translate them to the same cc1 option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes there is a way of doing that:
That still produces the misleading warning for: -fcx-limited-range -fcomplex-arithmetic=improved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I meant to suggest is that you can leave the driver-level options as if they were independent, but when we process them in RenderFloatingPointOptions, -fcx-limited-range and -fcomplex-arithmetic=basic (for example), would add the same cc1 option. Since the warning is generated from the RenderFloatingPointOptions we should be able to make that report the expected output. |
||
def fcomplex_arithmetic_EQ : Joined<["-"], "fcomplex-arithmetic=">, Group<f_Group>, | ||
Visibility<[ClangOption, CC1Option]>, | ||
Values<"full,improved,promoted,basic">, NormalizedValuesScope<"LangOptions">, | ||
NormalizedValues<["CX_Full", "CX_Improved", "CX_Promoted", "CX_Basic"]>; | ||
|
||
def complex_range_EQ : Joined<["-"], "complex-range=">, Group<f_Group>, | ||
Visibility<[CC1Option]>, | ||
Values<"full,improved,promoted,basic">, NormalizedValuesScope<"LangOptions">, | ||
NormalizedValues<["CX_Full", "CX_Improved", "CX_Promoted", "CX_Basic"]>, | ||
MarshallingInfoEnum<LangOpts<"ComplexRange">, "CX_Full">; | ||
|
||
def fcx_limited_range : Flag<["-"], "fcx-limited-range">, | ||
Group<f_Group>, Visibility<[ClangOption, CC1Option]>, | ||
HelpText<"Basic algebraic expansions of complex arithmetic operations " | ||
"involving are enabled.">; | ||
|
||
def fno_cx_limited_range : Joined<["-"], "fno-cx-limited-range">, | ||
def fno_cx_limited_range : Flag<["-"], "fno-cx-limited-range">, | ||
Group<f_Group>, Visibility<[ClangOption, CC1Option]>, | ||
HelpText<"Basic algebraic expansions of complex arithmetic operations " | ||
"involving are disabled.">; | ||
|
||
def fcx_fortran_rules : Joined<["-"], "fcx-fortran-rules">, | ||
def fcx_fortran_rules : Flag<["-"], "fcx-fortran-rules">, | ||
Group<f_Group>, Visibility<[ClangOption, CC1Option]>, | ||
HelpText<"Range reduction is enabled for complex arithmetic operations.">; | ||
|
||
def fno_cx_fortran_rules : Joined<["-"], "fno-cx-fortran-rules">, | ||
def fno_cx_fortran_rules : Flag<["-"], "fno-cx-fortran-rules">, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, just set the needed |
||
Group<f_Group>, Visibility<[ClangOption, CC1Option]>, | ||
HelpText<"Range reduction is disabled for complex arithmetic operations.">; | ||
|
||
def complex_range_EQ : Joined<["-"], "complex-range=">, Group<f_Group>, | ||
Visibility<[CC1Option]>, | ||
Values<"full,limited,fortran">, NormalizedValuesScope<"LangOptions">, | ||
NormalizedValues<["CX_Full", "CX_Limited", "CX_Fortran"]>, | ||
MarshallingInfoEnum<LangOpts<"ComplexRange">, "CX_Full">; | ||
|
||
// OpenCL-only Options | ||
def cl_opt_disable : Flag<["-"], "cl-opt-disable">, Group<opencl_Group>, | ||
Visibility<[ClangOption, CC1Option]>, | ||
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -283,9 +283,47 @@ class ComplexExprEmitter | |||||||
ComplexPairTy EmitComplexBinOpLibCall(StringRef LibCallName, | ||||||||
const BinOpInfo &Op); | ||||||||
|
||||||||
QualType getPromotionType(QualType Ty) { | ||||||||
QualType HigherPrecisionTypeForComplexArithmetic(QualType ElementType, | ||||||||
bool IsDivOpCode) { | ||||||||
const TargetInfo &TI = CGF.getContext().getTargetInfo(); | ||||||||
const LangOptions Opts = CGF.getLangOpts(); | ||||||||
if (const auto *BT = dyn_cast<BuiltinType>(ElementType)) { | ||||||||
switch (BT->getKind()) { | ||||||||
case BuiltinType::Kind::Float16: { | ||||||||
if (TI.hasFloat16Type() && !TI.hasLegalHalfType()) | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().FloatTy); | ||||||||
break; | ||||||||
} | ||||||||
case BuiltinType::Kind::BFloat16: { | ||||||||
if (TI.hasBFloat16Type() && !TI.hasFullBFloat16Type()) | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().FloatTy); | ||||||||
break; | ||||||||
} | ||||||||
case BuiltinType::Kind::Float: | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().DoubleTy); | ||||||||
break; | ||||||||
case BuiltinType::Kind::Double: { | ||||||||
if (TI.hasLongDoubleType()) | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().LongDoubleTy); | ||||||||
else | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().DoubleTy); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
No |
||||||||
break; | ||||||||
} | ||||||||
default: | ||||||||
return QualType(); | ||||||||
} | ||||||||
} | ||||||||
return QualType(); | ||||||||
} | ||||||||
|
||||||||
QualType getPromotionType(QualType Ty, bool IsDivOpCode = false) { | ||||||||
if (auto *CT = Ty->getAs<ComplexType>()) { | ||||||||
QualType ElementType = CT->getElementType(); | ||||||||
if (IsDivOpCode && ElementType->isFloatingType() && | ||||||||
CGF.getLangOpts().getComplexRange() == | ||||||||
LangOptions::ComplexRangeKind::CX_Promoted) | ||||||||
return HigherPrecisionTypeForComplexArithmetic(ElementType, | ||||||||
IsDivOpCode); | ||||||||
if (ElementType.UseExcessPrecision(CGF.getContext())) | ||||||||
return CGF.getContext().getComplexType(CGF.getContext().FloatTy); | ||||||||
} | ||||||||
|
@@ -296,11 +334,12 @@ class ComplexExprEmitter | |||||||
|
||||||||
#define HANDLEBINOP(OP) \ | ||||||||
ComplexPairTy VisitBin##OP(const BinaryOperator *E) { \ | ||||||||
QualType promotionTy = getPromotionType(E->getType()); \ | ||||||||
QualType promotionTy = getPromotionType( \ | ||||||||
E->getType(), \ | ||||||||
(E->getOpcode() == BinaryOperatorKind::BO_Div) ? true : false); \ | ||||||||
ComplexPairTy result = EmitBin##OP(EmitBinOps(E, promotionTy)); \ | ||||||||
if (!promotionTy.isNull()) \ | ||||||||
result = \ | ||||||||
CGF.EmitUnPromotedValue(result, E->getType()); \ | ||||||||
result = CGF.EmitUnPromotedValue(result, E->getType()); \ | ||||||||
return result; \ | ||||||||
} | ||||||||
|
||||||||
|
@@ -790,8 +829,10 @@ ComplexPairTy ComplexExprEmitter::EmitBinMul(const BinOpInfo &Op) { | |||||||
ResR = Builder.CreateFSub(AC, BD, "mul_r"); | ||||||||
ResI = Builder.CreateFAdd(AD, BC, "mul_i"); | ||||||||
|
||||||||
if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Limited || | ||||||||
Op.FPFeatures.getComplexRange() == LangOptions::CX_Fortran) | ||||||||
if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Basic || | ||||||||
Op.FPFeatures.getComplexRange() == LangOptions::CX_Improved || | ||||||||
Op.FPFeatures.getComplexRange() == LangOptions::CX_Promoted || | ||||||||
CGF.getLangOpts().NoHonorInfs || CGF.getLangOpts().NoHonorNaNs) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure NoHonorInfs or NoHonorNaNs should be checked here. Given that we have explicit control for complex arithmetic behavior, maybe that should take precedence. That seems to be the way gcc handles it: https://godbolt.org/z/1oGo7jznz There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code is in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, sorry, for some reason I thought this was invoking the special handling for division. I think the for honor NaNs and infinities still isn't necessary because we'll emit the comparison with the fast-math flags set and the backend will optimize it away, which is what happens today: https://godbolt.org/z/e8EnKodj6 |
||||||||
return ComplexPairTy(ResR, ResI); | ||||||||
|
||||||||
// Emit the test for the real part becoming NaN and create a branch to | ||||||||
|
@@ -982,13 +1023,18 @@ ComplexPairTy ComplexExprEmitter::EmitBinDiv(const BinOpInfo &Op) { | |||||||
llvm::Value *OrigLHSi = LHSi; | ||||||||
if (!LHSi) | ||||||||
LHSi = llvm::Constant::getNullValue(RHSi->getType()); | ||||||||
if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Fortran) | ||||||||
QualType ComplexElementTy = Op.Ty->castAs<ComplexType>()->getElementType(); | ||||||||
const BuiltinType *BT = ComplexElementTy->getAs<BuiltinType>(); | ||||||||
if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Improved || | ||||||||
(Op.FPFeatures.getComplexRange() == | ||||||||
LangOptions::CX_Promoted && BT->getKind() == BuiltinType::Kind::LongDouble)) | ||||||||
return EmitRangeReductionDiv(LHSr, LHSi, RHSr, RHSi); | ||||||||
else if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Limited) | ||||||||
else if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Basic || | ||||||||
Op.FPFeatures.getComplexRange() == LangOptions::CX_Promoted) | ||||||||
return EmitAlgebraicDiv(LHSr, LHSi, RHSr, RHSi); | ||||||||
else if (!CGF.getLangOpts().FastMath || | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should remove the fast-math check here. The driver handling of fast-math sets the complex arithmetic option. This check has always been problematic because disabling just one component of fast-math (such as enabling signed zeros) causes this to be false. |
||||||||
// '-ffast-math' is used in the command line but followed by an | ||||||||
// '-fno-cx-limited-range'. | ||||||||
// '-fno-cx-limited-range' or '-fcomplex-arithmetic=full'. | ||||||||
Op.FPFeatures.getComplexRange() == LangOptions::CX_Full) { | ||||||||
LHSi = OrigLHSi; | ||||||||
// If we have a complex operand on the RHS and FastMath is not allowed, we | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.