Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Merged
merged 28 commits into from
Jan 31, 2025

Conversation

zahiraam
Copy link
Contributor

@zahiraam zahiraam commented Oct 23, 2024

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

@zahiraam zahiraam changed the title Add support for -ftarget-prec-div/sqrt options. Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024
@zahiraam zahiraam marked this pull request as ready for review October 28, 2024 17:25
@zahiraam zahiraam requested review from a team as code owners October 28, 2024 17:25
@zahiraam zahiraam changed the title Add support for -foffload-fp32-prec-div/sqrt options. [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024
Comment on lines 1747 to 1750
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,
option::OPT_fno_offload_fp32_prec_div, true))
CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
else
CmdArgs.push_back("-foffload-fp32-prec-sqrt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment to above.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.
Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming that the code doesn't affect C stdlib's div function, see the comment above.

// ROUNDED-SQRT-PREC-DIV: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
// ROUNDED-DIV-PREC-SQRT: call reassoc nnan ninf nsz arcp afn spir_func nofpclass(nan inf) float @sqrt(float noundef nofpclass(nan inf) {{.*}})
// ROUNDED-DIV-ROUNDED-SQRT-FAST: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
// LOW-PREC-DIV: call float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT_LOW:[0-9]+]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if I understand this correctly. In this case we pass: -fno-offload-fp32-prec-div -ffp-builtin-accuracy=high flags. And 1.0 ULP fpbuiltin-error attribute for llvm.fpbuiltin.sqrt.f32 was generated in response of ffp-builtin-accuracy=high flag and we don't expect here precise calculations as high != precise, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@zahiraam zahiraam requested a review from a team as a code owner November 25, 2024 16:10
@zahiraam
Copy link
Contributor Author

The FE work has been approved. But this is an attempt to fix the LIT test DeviceLib/cmath_test.cpp. I am not really sure the change is correct. Will revert it LIT fail persists.

@zahiraam
Copy link
Contributor Author

@MrSidims Your PR fixed the issue with cmath_test.cp. Thanks!
@intel/llvm-gatekeepers Can this be merged in please? Thanks.

@martygrant martygrant merged commit 5823125 into intel:sycl Jan 31, 2025
16 checks passed

// CHECK-NOT: "-foffload-fp32-prec-div"
// CHECK-NOT: "-foffload-fp32-prec-sqrt"
// FPACC: "-ffp-builtin-accuracy=high"
Copy link
Contributor

@MrSidims MrSidims Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zahiraam @mdtoguchi hi, I'm revisiting some parts of the feature implementation, to ensure, that everything works right. Here I have a question: if I'm not mistaken -ffp-builtin-accuracy=high means, that all fp-builtins will be calculated with high precision. Shouldn't we restrict it to only sqrt/div fpbuiltins (see example in clang/test/Driver/offload-fp32-div-sqrt.cpp from above)?

@npmiller I see that long ago you have added a similar option "fsycl-fp32-prec-sqrt" that currently works for CUDA and HIP. I do believe the appropriate option added in this patch should be merged with your option (if you agree or don't agree with it please place your thoughts in #17033 ), but I've also noted, that way of handling those options are very different. In your patch you pass to the compiler an alias to -prec-sqrt=true to compiler resulting in metadata (that, I guess), is handled by CUDA backend. This patch instead makes clang to generate llvm.fpbuiltin.sqrt with max-error=0.5 to be generated in the response of this option, which is being lowered to llvm.sqrt by code added here , which seem to be also a correct approach: https://godbolt.org/z/5TKxbzcKE . If we decide to merge the options, guess we would either need to remove one of the mechanisms for CUDA or duplicate them, what would be your preference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrSidims The options -f[no]offload-fp32-prec-div/sqrt are only available when using -fsycl. In a cuda environment the options have no effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to merge the options.

However, we need to keep the existing way the flag is implemented, because CUDA ships with a bitcode library that we link against our kernels, which provides implementations for some of the math built-ins.

That library makes use of the metadata setup by the current way the flag is handled to pick out between precise and approx versions of sqrt in different math built-ins. So we could maybe use llvm.fpbuiltin.sqrt for direct calls to sqrt from the kernel, but we'll still need to setup the metadata for the bitcode library, so it's probably easier to keep it as-is.

Note that there's a similar mechanism for HIP, although I'm not sure we use it as much, but we probably should keep the current setup there to make sure the metadata is set properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npmiller thanks!

@zahiraam may be I'm not appropriately putting '=' between CUDA and NVPTX environments for SYCL or missing something else. But we had observed a regression in one of CUDA math tests with this very patch, hence the options have affect on it, aren't they (at least -fno-offload-fp32-prec)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrSidims If the expected behavior for the failing test is to generate llvm-fpbuiltin.sqrt.f32 and llvm-fpbuiltin.fdiv.f32, then we should have fno-offload-fp32-prec-div/sqrt expand to cc1 option fno-offload-fp32-prec-div/sqrt when compiling cuda code. Currently that is the case only when compiling sycl code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zahiraam I mean that in this very PR in this pre-commit run the following test: SYCL :: sycl-in-tree/DeviceLib/~cmath_test.cpp in CUDA environment has failed due to llvm.fpbuiltin.fdiv.f32 not having lowering in AltMathLib. Thus I make a conclusion, that these options like they are defined now has affect on CUDA, aren't they? What I was trying to understand if such affect is limited only to intrinsics generation or "// FPACC: "-ffp-builtin-accuracy=high"" (from this test) is also part of this effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. f[no]-offload-fp32-prec-div/sqrt is not responsible for any further options propagation for CUDA and HIP. It only affects whether llvm.fpbuiltin.div/sqrt with max-error=2.5/3.0 is generated or 'standard' LLVM instructions/intrinsics/builtins will be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants