-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Diagnose __float128 type usage in device code, fixes and cleanups #971
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no issues and approve of the direction. The two TODOs likely need to be fixed ASAP though.
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -1405,6 +1405,8 @@ Sema::DeviceDiagBuilder Sema::SYCLDiagIfDeviceCode(SourceLocation Loc, | |||
"Should only be called during SYCL compilation"); | |||
FunctionDecl *CurFunction = dyn_cast<FunctionDecl>(getCurLexicalContext()); | |||
DeviceDiagBuilder::Kind DiagKind = [this, CurFunction] { | |||
if (!CurFunction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking... I wonder if this case needs some sort of lexical context associated with it. For example, I think the curlexicalcontext can also be the containing type (in the case of a member variable initializer). Perhaps this needs to be more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, hope I understand your comment correctly.
Maybe once we teach deferred diagnostics system to apply diagnostics not only to some functions, we will need to analyze which type of lexical context do we have now...
But for now it seems enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the title of the pull-request it seems that this should be within the scope of this pull request!
In reality, there are 3 different cases I can think of:
1- CurLexicalContext is TU/global/etc (maybe null?): Should diagnose immediately.
2- CurLexicalContext is a FunctionDecl: Should do what it does now, enqueuing a diagnostic if that function isn't currently being emitted, else diagnosing.
2- CurLexicalContext is a type (typically a record type): This likely means we're in a member initializer. If we're a static member, diagnose immediately. Else, diagnose only once the type is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the title of the pull-request it seems that this should be within the scope of this pull request!
Hmm, not sure about it. It says "fix for SYCL" not "redesign common parts". Because DeviceDiagBuilder
knows how to attach diagnostics only to a function, I assume it needs some redesign to make deferred diagnostics for other types of Lexical context.
I think to make it work we probably need to make a separate patch to llorg clang which will teach DeviceDiagBuilder
how to attach deferred diagnostics to something like DeclContext
, and not only to a FunctionDecl
, and I'm ready to work on this. Maybe we can even do something like this here in our code base but it will need to be upstreamed ASAP.
1- CurLexicalContext is TU/global/etc (maybe null?): Should diagnose immediately.
Also not sure. We should diagnose something only if it will get into the device code. Otherwise it will prevent from writing valid host code. If some global uses something illegal, but it doesn't get into device code (this likely means that this global is not used inside the device code), it should be ok.
CurLexicalContext is a type (typically a record type): This likely means we're in a member initializer. If we're a static member, diagnose immediately. Else, diagnose only once the type is used.
I also think if this type is not used in device code, it shouldn't produce immediate diagnostics. I experimented with it, and saw that if this type is not used in device code, static member also doesn't get into device code and shouldn't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure about it. It says "fix for SYCL" not "redesign common parts".
It says to 'fix deferred diagnostics for SYCL'. This is a case where they are broken for SYCL :)
Also not sure. We should diagnose something only if it will get into the device code. Otherwise it will prevent from writing valid host code. If some global uses something illegal, but it doesn't get into device code (this likely means that this global is not used inside the device code), it should be ok.
Global initializers need to run on program launch, so they should ALWAYS get into the device code.
I also think if this type is not used in device code, it shouldn't produce immediate diagnostics. I experimented with it, and saw that if this type is not used in device code, static member also doesn't get into device code and shouldn't be a problem.
Interesting... Static initializers are supposed to run same as global initializers, immediately on launch. That is likely a different bug! That said, the deferred diagnostics likely need to accept RecordDecl as well as FunctionDecl (for member initializers), and emit when that type is emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says to 'fix deferred diagnostics for SYCL'. This is a case where they are broken for SYCL :)
But they also broken for any programming model which can use in-class member initializers for example.
Interesting... Static initializers are supposed to run same as global initializers, immediately on launch. That is likely a different bug! That said, the deferred diagnostics likely need to accept RecordDecl as well as FunctionDecl (for member initializers), and emit when that type is emitted.
I can double check If I missed something (likely in the new year).
But, I'd say it's SYCL specific behavior, not a bug. Otherwise the rule "use regular c++" in host code won't work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they also broken for any programming model which can use in-class member initializers for example.
And yet... Still broken for SYCL. I'm mostly teasing that your patch topic promises a 'fix all the things (for SYCL)', yet you're attempting to put things out of the scope of that. A valid response to this would be to limit the scope of the patch.
But, I'd say it's SYCL specific behavior, not a bug. Otherwise the rule "use regular c++" in host code won't work...
Its a C++ language rule. Static initializers are supposed to run before main. The language treats them as not part of the type (except for how they are named), but as global variables that should be accessible anywhere. MANY programing models depend on static initialization to 'just work' that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet... Still broken for SYCL. I'm mostly teasing that your patch topic promises a 'fix all the things (for SYCL)', yet you're attempting to put things out of the scope of that. A valid response to this would be to limit the scope of the patch.
I just didn't mean that the patch fixes all the things, I'm sorry if I confused you by this. I tried to say that this fixes related code added for SYCL but maybe you are right and it doesn't look so.
I see that you made the patch with similar fixes, but I'd like to keep some changes from this patch too, so I suppose the new topic for this patch will depend on how we decide to merge these patches.
Its a C++ language rule. Static initializers are supposed to run before main. The language treats them as not part of the type (except for how they are named), but as global variables that should be accessible anywhere. MANY programing models depend on static initialization to 'just work' that way.
But SYCL is not regular C++ for device :), I'm even not sure that it should be called C++ at all... There is no main, and only kernels and functions called by kernels are supposed to run on device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL might be normal C++ on some special devices sharing a lot of features with the host processor, but this is clearly outside of plain SYCL 1.2.1.
Of course, for the host device, this is already regular C++.
Intel Xeon Phi seems like a good candidate for a device that could support normal C++ once you have fine-grain system SVM.
Otherwise, I have some old presentations at Khronos F2F about some compromise adding the concept of SYCL program storage and later the generalization with scopes.
While this is not normal C++, it provides some way to create some objects with some persistent storage relative to some scope.
94cf441
to
b30af0b
Compare
Okay then, hope now the title doesn't promise fix everything :) |
b30af0b
to
c99c076
Compare
Hm, it looks like after the patch from @erichkeane we don't emit deferred diagnostics from functions marked with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it looks like after the patch from @erichkeane we don't emit deferred diagnostics from functions marked with
sycl_kernel
attribute. I don't understand yet why this is happened, but user doesn't have possibility to write any code in such functions because they are defined in SYCL headers, so I'm not sure that it's a problem. But I discovered it because we made a lot of tests with code inside the function withsycl_kernel
attribute,
@erichkeane, what about SYCL library/header developers? They might write invalid code and clang won't complain. Is that intentional?
clang/lib/Sema/SemaType.cpp
Outdated
if (!S.Context.getTargetInfo().hasFloat128Type() && | ||
!S.getLangOpts().SYCLIsDevice && | ||
!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice)) | ||
if (S.getLangOpts().SYCLIsDevice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (S.getLangOpts().SYCLIsDevice) | |
if (!S.Context.getTargetInfo().hasFloat128Type() && S.getLangOpts().SYCLIsDevice) |
This seems to be a more accurate check. There might be targets supporting __float128
in AOT mode and using it should be okay in this case. Agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This is absolutely not my intent... I didn't have any lit tests fail in my patch so I thought I had covered all cases. The only exception is that uninstantiated templates won't diagnose, but they WILL once instantiated. Could that be it? Can you give me a reproducer, and I'll look at it today. |
I think this is happened because lack of test cases...
I can't see a diagnostic from this code:
|
Thanks! I'll look into that this morning. |
Hmm... SO, I'm still trying to think through this, but I've got a decent idea of the problem at least at the moment. The problem is that in SOME cases, we actually visit a diagnostic up to 3x (in the case of kernels). 1: When doing Phase 1 of templates. 2: when doing Phase 2. This is only supposed to happen when something has 'changed' via tree-transform. GCCAsm seems to skip the rebuild in tree-transform if nothing has changed, but apparently CallExpr does NOT. The result is something that has already diagnosed, diagnoses again. 3: When constructing the OpenCL kernel. We likely shouldn't do ANY diagnostics here, since we should have done it in either 1 or 2 above. Right now, we intentionally skip 3 (thanks to my last change). We are also (perhaps improperly) skipping 1. Due to a bug (IMO) in Tree-transform for CallExpr , the variadic function call is getting diagnosed in all 3. GCC ASM is only getting diagnosed in 1 and 3. I believe the fix is to repair the CallExpr bug in TreeTransform if I can, and remove the skip of 1. |
See #1018 for the bug report to capture the bigger problem, and #1019 for the fix to the immediate problem. |
Additional changes: Cleaned up unnecessary calls of CheckSYCLCall function. Fixed diagnosing of storage allocation through deferred diagnostics system. Renamed CheckSYCLCall with checkSYCLDeviceFunction since we don't really check calls as CUDA/OpenMP does. Removed unnecessary emitting of diagnostics from SemaSYCL. Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
8d857dc
to
ed3e013
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, take a look at failing lit test.
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
This failed because we started checking target information to emit a diagnostic about |
Does someone know why test is failing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just about perfectly OK with this. 1 nit, plus wanting to know our plans for global/static inits/etc.
@@ -1407,16 +1401,16 @@ Sema::DeviceDiagBuilder Sema::SYCLDiagIfDeviceCode(SourceLocation Loc, | |||
"Should only be called during SYCL compilation"); | |||
FunctionDecl *FD = dyn_cast<FunctionDecl>(getCurLexicalContext()); | |||
DeviceDiagBuilder::Kind DiagKind = [this, FD] { | |||
if (ConstructingOpenCLKernel) | |||
if (ConstructingOpenCLKernel || !FD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FD can be NULL in the cases where getCurLexicalContext is TU/namespace/record/etc. Don't we want to diagnose always in these cases, since they are likely static initializers? Something like:
static auto Global = throw 3;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to diagnose things like static initializers always (if they are not completely invalid for both host and device, but this case is diagnosed out of the box, I suppose).
Yes, I understand that static initializes run before main in regular C++, but there is no main in SYCL device code, only kernels and everything that used by kernels (functions/variables).
We want to diagnose only things that can get into device code. Things like static initializers shouldn't get into device code at all due to SYCL specific. SYCL spec even says: every variable with static storage duration that are odr-used inside a kernel must be const or constexpr and zero-initialized or constant-initialized.
I think all these rules tell that static initializers aren't executed on device and variables initialized in such way cannot be used inside the device code.
In addition, if we will diagnose such cases always, it will prevent from writing regular c++ code for host...
I think we want to diagnose case with in-class initializers, and in this case getCurLexicalContext is record, but we also want to diagnose it with deferred diagnostic because diagnostic is only needed if corresponding type is used in device code, but it requires changes in common parts of deferred diagnostics system, I'm looking into it #1032 ...
@@ -259,12 +259,11 @@ StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, bool IsSimple, | |||
// Skip all the checks if we are compiling SYCL device code, but the function | |||
// is not marked to be used on device, this code won't be codegen'ed anyway. | |||
if (getLangOpts().SYCLIsDevice) { | |||
SYCLDiagIfDeviceCode(AsmLoc, diag::err_sycl_restrict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has anything changed in this file, or is this simply a formatting change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just formatting.
The problem is that CheckSYCLCall used to get the caller incorrectly.
getCurFunctionDecl gets the wrong thing. It will get the function
containing the lambda at definition time, rather than the lambda operator().
Additional changes:
Cleaned up unnecessary calls of CheckSYCLCall function.
Fixed diagnosing of storage allocation through deferred diagnostics
system.
Renamed CheckSYCLCall with checkSYCLDeviceFunction since we don't really
check calls as CUDA/OpenMP does.
Used deferred diagnostics to diagnose __float128 type usage.
Removed unnecessary emitting of diagnostics from SemaSYCL.
Updated tests.
Signed-off-by: Mariya Podchishchaeva mariya.podchishchaeva@intel.com