-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Section mismatch warnings around numa_nodes_parsed #1302
Comments
As in #1301, the problem here is not that llvm decides against inlining the function, but that it creates a specialized version of the function that hardcodes the reference to a global variable in an incompatible section. |
synthesized functions should retain the section attribute from the function they were synthesized from; we can fix that in LLVM. A smaller test case would be helpful. |
Reduced test case: https://godbolt.org/z/KMaYvW |
Sorry, I don't fully follow from the reduced test case. The issue was that The first thing to do then is find what pass creates these specialized functions. It may be a middle end pass; emitting llvm ir via |
Emitting the single specialized function seems a little odd on first glance. (Why not just inline the thing?) Adding extern d y;
void x(void) {
test_bit(y.bits);
} makes |
Protip (from @aeubanks) :
Looks indeed like this is NPM related. |
It looks like IPSCCPPass was never run with the Legacy Pass Manager (LPM).
|
Nevermind, it's spelled out differently (ugh)
so LPM does the same transform, just NPM didn't do the inlining. Next is to query the inliner |
I'll have to think about this more; though I still think perhaps IPSCCP should not be sinking parameters with section attributes, or into callees with different section attributes, since when inlining heuristics change, we get these kind of issues. Perhaps I need to ponder "what should happen, if the callee wasn't so small?" Or perhaps, ( |
Working on a patch in https://reviews.llvm.org/D97971. |
I'm incredibly distracted right now with assembler related issues in Android, but I was able to eliminate all of these with: diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 37363d570b9b..aef7c766f335 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -129,7 +129,7 @@ static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
{
instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
return arch_test_bit(nr, addr);
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index ac398e143c9a..a34be5098bbb 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -235,7 +235,7 @@ static inline int __nodes_full(const nodemask_t *srcp, unsigned int nbits)
}
#define nodes_weight(nodemask) __nodes_weight(&(nodemask), MAX_NUMNODES)
-static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
+static __always_inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
{
return bitmap_weight(srcp->bits, nbits);
}
@@ -260,13 +260,13 @@ static inline void __nodes_shift_left(nodemask_t *dstp,
> MAX_NUMNODES, then the silly min_ts could be dropped. */
#define first_node(src) __first_node(&(src))
-static inline int __first_node(const nodemask_t *srcp)
+static __always_inline int __first_node(const nodemask_t *srcp)
{
return min_t(int, MAX_NUMNODES, find_first_bit(srcp->bits, MAX_NUMNODES));
}
#define next_node(n, src) __next_node((n), &(src))
-static inline int __next_node(int n, const nodemask_t *srcp)
+static __always_inline int __next_node(int n, const nodemask_t *srcp)
{
return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
} that include/asm-generic/bitops/instrumented-non-atomic.h but other definitions of include/linux/nodemask.h has a curious comment:
So it feels kind of suspicious to be kneecapping LLVM's constant propagation for a complex edge case for satisfying rules highly specific to the Linux kernel, when the kernel itself has a comment in these headers saying "use |
I am a bit skeptical about whether that comment still applies though. This was added in 2013, and the compilers were quite a bit different back then. I would not be surprised if whatever bug happened back than was completely unreproducible with gcc-4.9 and higher. The hack I have locally to build cleanly just drops the __initdata annotation on numa_nodes_parsed ;-) Another possible hack would be to annotate amd_numa_init() as attribute((flatten)) |
That comment is also perfectly in line with Boris' feedback on If marking all callees recursively
Are you referring to a different comment in the same source file, or the one I posted? Because the comment I linked to looks totally reasonable to me, and I don't see how that would have changed in compilers since 2013. |
allyesconfig minus CONFIG_UBSAN_OBJECT_SIZE fixes most (except for I do think it would be clearer for kernel maintainers when sending them patches to specify precisely which configs trigger this (rather than use less precise language like "clang sometimes decides to not inline") though, consider: For AMD_NUMA+KASAN alone (on top of defconfig) are enough to trigger the warnings from I also wonder why these warnings get printed twice? Does modpost have a bug where it's accidentally doing double the work? |
A quick hack to LLVM to run EDIT this diff is now highly outdated... diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 52f7026c40eb..a41bf6419cd4 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -666,6 +666,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// Form SSA out of local memory accesses after breaking apart aggregates into
// scalars.
FPM.addPass(SROA());
+ FPM.addPass(LowerConstantIntrinsicsPass());
// Catch trivial redundancies
FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */)); Fixes all of the observed warnings in Arnd's example, and the kernel except |
I happened to be reading though the source of modpost (scripts/mod/modpost.c); this is exactly |
commit 4a3893d ("modpost: don't emit section mismatch warnings for compiler optimizations") is what added modpost pattern 5; I'd like to see if I can dig up the lore thread around its discussion. |
I think we should just create a compiler specific allow list in modpost for this. |
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 3e623ccc020b..c160f800f5e3 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -853,6 +853,27 @@ static int match(const char *sym, const char * const pat[])
return 0;
}
+struct secref_exception {
+ const char * const fromsym, * const tosym;
+};
+
+static const struct secref_exception secref_allowlist [] = {
+ { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" },
+ { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" },
+ { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" },
+ { .fromsym = "early_get_smp_config", .tosym = "x86_init" },
+ { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" },
+};
+
+static int match_allowlist(const char* fromsym, const char* tosym) {
+ int i = 0, e = sizeof(secref_allowlist) / sizeof(secref_allowlist[0]);
+ for (; i != e; ++i)
+ if (!strcmp(secref_allowlist[i].fromsym, fromsym) &&
+ !strcmp(secref_allowlist[i].tosym, tosym))
+ return 1;
+ return 0;
+}
+
/* sections that we do not want to do full section mismatch check on */
static const char *const section_white_list[] =
{
@@ -1192,6 +1213,8 @@ static const struct sectioncheck *section_mismatch(
* tosec = init section
* fromsec = text section
* refsymname = *.constprop.*
+ * LLVM will do similar constant propagation, but it will not rename the
+ * transformed callee.
*
* Pattern 6:
* Hide section mismatch warnings for ELF local symbols. The goal
@@ -1235,9 +1258,10 @@ static int secref_whitelist(const struct sectioncheck *mismatch,
/* Check for pattern 5 */
if (match(fromsec, text_sections) &&
- match(tosec, init_sections) &&
- match(fromsym, optim_symbols))
- return 0;
+ match(tosec, init_sections))
+ if (match(fromsym, optim_symbols) ||
+ match_allowlist(fromsym, tosym))
+ return 0;
/* Check for pattern 6 */
if (strstarts(fromsym, ".L")) |
@apinski-cavium in #gcc irc helped me pinpoint some relevant GCC commits |
Is the plan to send the modpost patch or to have LLVM perform similar clone renaming? |
Probably modpost. It's silly IMO to ask LLVM to change to some arbitrary convention that the kernel's modpost happens to rely on. I can hack up a change quickly to LLVM, but this results in 73+ existing test failures in LLVM. I can try to minimize the checking in modpost to object files built by clang. |
Jotting down my research notes from today. I ran a ~4hr creduce on the defconfig issue: https://gist.github.com/nickdesaulniers/ea307c16f747938dbfad34cab6f04a72. What's interesting about this case is that the BCP representation in IR I have a fix in llvm/lib/Analysis/InlineCost.cpp, but this bug might actually be deeper in llvm/lib/Analysis/ConstantFolding.cpp. Basically, in the above case, we know that Stated another way, as soon as we see a non That fixes the issue observed for the defconfig, but not the allconfig. So there may be more than one bug here; perhaps object size is related for allyesconfig or has a similar issue. Or my WIP local patch may have a bug; though I don't regress any existing LLVM tests. I will clean up the patch and post it tomorrow. |
https://reviews.llvm.org/D111272 is the first part, which fixes the unusually high cost seen in the defconfigs. I'm now investigating the issues from the all*configs; I don't see any unusually high "costs" preventing inlining. Instead, I'm seeing negative costs in defconfigs vs just-over-threshold costs in allmodconfig. Let me try to what can cause the costs to become negative, since that's a little curious. That might indicate if something is missing in the all*configs. Absurdly to the point of comedy, the mere act of questioning "why was this inlined or not" ( |
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272
Before this week's changes, we had warning from:
part 1 has landed. This eliminates the issues with defconfig, but does nothing for allmodconfig. I have some ideas for making the analysis even more precise, it just involves a lot more work at compile time. ("Will inlining change what __builtin_constant_p evaluates to?" rather than "__builtin_constant_p is either 1 or 0, but not both/a runtime call"). the second issue I see is that calls to That eliminates 2, 3, and 4 from allmodconfig. This leaves us with 2 cases from allmodconfig |
Slight ping, as I'm running into this situation right now. Is there a status update on the various patches? |
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272 (cherry picked from commit 9697f93)
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272 (cherry picked from commit 9697f93)
note that |
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272 (cherry picked from commit 9697f93)
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272 (cherry picked from commit 9697f93)
https://reviews.llvm.org/rG9697f93587f46300814f1c6c68af347441d6e05d is what ultimately landed in LLVM; it allowed us to get more aggressive about inlining callees containing calls to https://reviews.llvm.org/D111456 (just pinged Kazu) is maybe still interesting if this comes up again, but closing this out now. We no longer have any instances of section mismatch warnings for x86_64 allmodconfig due to inlining decisions. |
https://reviews.llvm.org/rG9697f93587f46300814f1c6c68af347441d6e05d shipped in clang-14. |
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: ClangBuiltLinux/linux#1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272
Very similar to https://reviews.llvm.org/D111272. We very often can evaluate calls to llvm.objectsize.* regardless of inlining. Don't count calls to llvm.objectsize.* against the InlineCost when we can evaluate the call to a constant. Link: ClangBuiltLinux/linux#1302 Reviewed By: manojgupta Differential Revision: https://reviews.llvm.org/D111456
Very similar to https://reviews.llvm.org/D111272. We very often can evaluate calls to llvm.objectsize.* regardless of inlining. Don't count calls to llvm.objectsize.* against the InlineCost when we can evaluate the call to a constant. Link: ClangBuiltLinux/linux#1302 Reviewed By: manojgupta Differential Revision: https://reviews.llvm.org/D111456
Very similar to https://reviews.llvm.org/D111272. We very often can evaluate calls to llvm.objectsize.* regardless of inlining. Don't count calls to llvm.objectsize.* against the InlineCost when we can evaluate the call to a constant. Link: ClangBuiltLinux/linux#1302 Reviewed By: manojgupta Differential Revision: https://reviews.llvm.org/D111456
Very similar to https://reviews.llvm.org/D111272. We very often can evaluate calls to llvm.objectsize.* regardless of inlining. Don't count calls to llvm.objectsize.* against the InlineCost when we can evaluate the call to a constant. Link: ClangBuiltLinux/linux#1302 Reviewed By: manojgupta Differential Revision: https://reviews.llvm.org/D111456
These are all small functions marked with
inline
so they should be getting inlined but new pass manager probably changed that.The text was updated successfully, but these errors were encountered: