-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836
Conversation
to be applied to OpenMP too.
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-div"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
else | |
CmdArgs.push_back("-foffload-fp32-prec-div"); | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div, | |
option::OPT_fno_offload_fp32_prec_div, true)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); |
Since -foffload-fp32-prec-div
is default
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-sqrt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@premanandrao can you review this please?
function instead of adding a JobAction to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming that the code doesn't affect C stdlib's div
function, see the comment above.
// ROUNDED-SQRT-PREC-DIV: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]] | ||
// ROUNDED-DIV-PREC-SQRT: call reassoc nnan ninf nsz arcp afn spir_func nofpclass(nan inf) float @sqrt(float noundef nofpclass(nan inf) {{.*}}) | ||
// ROUNDED-DIV-ROUNDED-SQRT-FAST: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]] | ||
// LOW-PREC-DIV: call float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT_LOW:[0-9]+]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check if I understand this correctly. In this case we pass: -fno-offload-fp32-prec-div -ffp-builtin-accuracy=high flags. And 1.0 ULP fpbuiltin-error attribute for llvm.fpbuiltin.sqrt.f32 was generated in response of ffp-builtin-accuracy=high flag and we don't expect here precise calculations as high != precise, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
The FE work has been approved. But this is an attempt to fix the LIT test |
@MrSidims Your PR fixed the issue with |
|
||
// CHECK-NOT: "-foffload-fp32-prec-div" | ||
// CHECK-NOT: "-foffload-fp32-prec-sqrt" | ||
// FPACC: "-ffp-builtin-accuracy=high" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zahiraam @mdtoguchi hi, I'm revisiting some parts of the feature implementation, to ensure, that everything works right. Here I have a question: if I'm not mistaken -ffp-builtin-accuracy=high means, that all fp-builtins will be calculated with high precision. Shouldn't we restrict it to only sqrt/div fpbuiltins (see example in clang/test/Driver/offload-fp32-div-sqrt.cpp from above)?
@npmiller I see that long ago you have added a similar option "fsycl-fp32-prec-sqrt" that currently works for CUDA and HIP. I do believe the appropriate option added in this patch should be merged with your option (if you agree or don't agree with it please place your thoughts in #17033 ), but I've also noted, that way of handling those options are very different. In your patch you pass to the compiler an alias to -prec-sqrt=true to compiler resulting in metadata (that, I guess), is handled by CUDA backend. This patch instead makes clang to generate llvm.fpbuiltin.sqrt
with max-error=0.5 to be generated in the response of this option, which is being lowered to llvm.sqrt
by code added here , which seem to be also a correct approach: https://godbolt.org/z/5TKxbzcKE . If we decide to merge the options, guess we would either need to remove one of the mechanisms for CUDA or duplicate them, what would be your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrSidims The options -f[no]offload-fp32-prec-div/sqrt
are only available when using -fsycl
. In a cuda environment the options have no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to merge the options.
However, we need to keep the existing way the flag is implemented, because CUDA ships with a bitcode library that we link against our kernels, which provides implementations for some of the math built-ins.
That library makes use of the metadata setup by the current way the flag is handled to pick out between precise and approx versions of sqrt in different math built-ins. So we could maybe use llvm.fpbuiltin.sqrt
for direct calls to sqrt
from the kernel, but we'll still need to setup the metadata for the bitcode library, so it's probably easier to keep it as-is.
Note that there's a similar mechanism for HIP, although I'm not sure we use it as much, but we probably should keep the current setup there to make sure the metadata is set properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npmiller thanks!
@zahiraam may be I'm not appropriately putting '=' between CUDA and NVPTX environments for SYCL or missing something else. But we had observed a regression in one of CUDA math tests with this very patch, hence the options have affect on it, aren't they (at least -fno-offload-fp32-prec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrSidims If the expected behavior for the failing test is to generate llvm-fpbuiltin.sqrt.f32
and llvm-fpbuiltin.fdiv.f32
, then we should have fno-offload-fp32-prec-div/sqrt
expand to cc1
option fno-offload-fp32-prec-div/sqrt
when compiling cuda
code. Currently that is the case only when compiling sycl
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zahiraam I mean that in this very PR in this pre-commit run the following test: SYCL :: sycl-in-tree/DeviceLib/~cmath_test.cpp in CUDA environment has failed due to llvm.fpbuiltin.fdiv.f32 not having lowering in AltMathLib. Thus I make a conclusion, that these options like they are defined now has affect on CUDA, aren't they? What I was trying to understand if such affect is limited only to intrinsics generation or "// FPACC: "-ffp-builtin-accuracy=high"" (from this test) is also part of this effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. f[no]-offload-fp32-prec-div/sqrt is not responsible for any further options propagation for CUDA and HIP. It only affects whether llvm.fpbuiltin.div/sqrt with max-error=2.5/3.0 is generated or 'standard' LLVM instructions/intrinsics/builtins will be used.
Add support for options
-f[no]-offload-fp32-prec-div
and-f[no-]-offload-fp32-prec-sqrt
.These options are added to allow users to control whether
fdiv
andsqrt
operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.When the correctly rounded setting is used, we can just generate the
fdiv
instruction andllvm.sqrt
intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.When the result is not required to be correctly rounded, the front end should generate a call to the
llvm.fpbuiltin.fdiv
orllvm.fpbuiltin.sqrt
intrinsic with thefpbuiltin-max-error
attribute set. For single precisionfdiv
, the setting should be2.5
. For single-precision sqrt, the setting should be3.0
.If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set
-foffload-fp32-prec-div
or-foffload-fp32-prec-sqrt
option.