Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section mismatch warnings around numa_nodes_parsed #1302

Closed
nathanchance opened this issue Feb 12, 2021 · 30 comments
Closed

Section mismatch warnings around numa_nodes_parsed #1302

nathanchance opened this issue Feb 12, 2021 · 30 comments
Assignees
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] llvm A bug that should be fixed in upstream LLVM Clean build Issue needs to be fixed to get a warning-free all*config build [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x [PATCH] Bitrot Patch is outdated and needs to be refreshed or revisited

Comments

@nathanchance
Copy link
Member

$ make -skj"$(nproc)" LLVM=1 O=build/x86_64 distclean allmodconfig vmlinux
...
WARNING: modpost: vmlinux.o(.text+0x14428e): Section mismatch in reference from the function test_bit() to the variable .init.data:numa_nodes_parsed
The function test_bit() references
the variable __initdata numa_nodes_parsed.
This is often because test_bit lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

WARNING: modpost: vmlinux.o(.text+0x14436f): Section mismatch in reference from the function __first_node() to the variable .init.data:numa_nodes_parsed
The function __first_node() references
the variable __initdata numa_nodes_parsed.
This is often because __first_node lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

WARNING: modpost: vmlinux.o(.text+0x1443b7): Section mismatch in reference from the function __next_node() to the variable .init.data:numa_nodes_parsed
The function __next_node() references
the variable __initdata numa_nodes_parsed.
This is often because __next_node lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.
...

These are all small functions marked with inline so they should be getting inlined but new pass manager probably changed that.

@nathanchance nathanchance added [BUG] Untriaged Something isn't working [ARCH] x86_64 This bug impacts ARCH=x86_64 labels Feb 12, 2021
@arndb
Copy link

arndb commented Feb 14, 2021

As in #1301, the problem here is not that llvm decides against inlining the function, but that it creates a specialized version of the function that hardcodes the reference to a global variable in an incompatible section.

@nickdesaulniers
Copy link
Member

synthesized functions should retain the section attribute from the function they were synthesized from; we can fix that in LLVM. A smaller test case would be helpful.

@arndb
Copy link

arndb commented Feb 21, 2021

Reduced test case: https://godbolt.org/z/KMaYvW

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Feb 21, 2021

Sorry, I don't fully follow from the reduced test case. The issue was that test_bit was not inlined into g (which is in .init.text), so the reference to a and b in test_bit is bad? Oh, the version of test_bit emitted has the reference to numa_nodes_parsed in it. That's new. Ah, and the specialized version is not in the same section of the caller it would have been in had it been inlined.

The first thing to do then is find what pass creates these specialized functions. It may be a middle end pass; emitting llvm ir via clang -emit-llvm -S ... -o foo.ll then opt -O2 -print-after-all foo.ll then observing which passes moved the reference to numa_nodes_parsed into test_bit will tell us which pass we need to teach about inheriting section function attributes.

@nickdesaulniers
Copy link
Member

Emitting the single specialized function seems a little odd on first glance. (Why not just inline the thing?) Adding

extern d y;
void x(void) {
    test_bit(y.bits);
}

makes test_bit get inlined into both callers. Seems like something curious when there's only 1 caller.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Feb 23, 2021

Protip (from @aeubanks) :

-flegacy-pass-manager to use the LPM
-fexperimental-new-pass-manager for explicitly using the NPM
(-fno-experimental-new-pass-manager is the same as -flegacy-pass-manager)

Looks indeed like this is NPM related. --enable-new-pm/--enable-new-pm=0 for opt.
Looks like "IPSCCPPass"/ipsccp (aka "interprocedural sparse conditional constant propagation and merging") is the culprit. I wonder if that should be running after inlining?

@nickdesaulniers
Copy link
Member

It looks like IPSCCPPass was never run with the Legacy Pass Manager (LPM).

$ opt -O2 foo.ll -o - -S --enable-new-pm -print-after-all -print-module-scope 2>&1 | grep "IR Dump"

*** IR Dump After VerifierPass ***
*** IR Dump After Annotation2MetadataPass ***
*** IR Dump After ForceFunctionAttrsPass ***
*** IR Dump After InferFunctionAttrsPass ***
*** IR Dump After SimplifyCFGPass *** (function: g)
*** IR Dump After SROA *** (function: g)
*** IR Dump After EarlyCSEPass *** (function: g)
*** IR Dump After LowerExpectIntrinsicPass *** (function: g)
*** IR Dump After SimplifyCFGPass *** (function: test_bit)
*** IR Dump After SROA *** (function: test_bit)
*** IR Dump After EarlyCSEPass *** (function: test_bit)
*** IR Dump After LowerExpectIntrinsicPass *** (function: test_bit)
*** IR Dump After IPSCCPPass ***
...
$ opt -O2 foo.ll -o - -S -print-after-all -print-module-scope 2>&1 | grep "IR Dump"
...
*** IR Dump After Function Integration/Inlining ***
<test_bit is now inlined>

@nickdesaulniers
Copy link
Member

It looks like IPSCCPPass was never run with the Legacy Pass Manager (LPM).

Nevermind, it's spelled out differently (ugh)

*** IR Dump After Interprocedural Sparse Conditional Constant Propagation ***

so LPM does the same transform, just NPM didn't do the inlining.

Next is to query the inliner
opt -passes=ipsccp,inline foo.ll -o - -S -debug-only=inline brb (need a debug build of LLVM)

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Feb 23, 2021

$ opt -passes=inline foo.ll -o - -S -debug-only=inline
Inlining calls in: g
    NOT Inlining (cost=55, threshold=45), Call:   call void @test_bit(i32* getelementptr inbounds (%struct.d, %struct.d* @numa_nodes_parsed, i32 0, i32 0, i64 0))
...
$ opt -O2 foo.ll -o - -S -debug-only=inline                              
Inlining calls in: g
Inlining calls in: g
    NOT Inlining (cost=45, threshold=45), Call:   call fastcc void @test_bit()
...
$ opt -O2 foo.ll -o - -S -debug-only=inline --enable-new-pm=0
Inliner visiting SCC: g: 1 call sites.
    Inlining (cost=-14955, threshold=225), Call:   call fastcc void @test_bit()
    -> Deleting dead function: test_bit
...

I'll have to think about this more; though I still think perhaps IPSCCP should not be sinking parameters with section attributes, or into callees with different section attributes, since when inlining heuristics change, we get these kind of issues. Perhaps I need to ponder "what should happen, if the callee wasn't so small?" Or perhaps, test_bit should always be marked __always_inline such that it can be called from both __init and regular code in .text?

(-debug-only=inline-cost is interesting to play with)

@nickdesaulniers
Copy link
Member

Working on a patch in https://reviews.llvm.org/D97971.

@nickdesaulniers nickdesaulniers added [BUG] llvm A bug that should be fixed in upstream LLVM [PATCH] Submitted A patch has been submitted for review and removed [BUG] Untriaged Something isn't working labels Mar 5, 2021
@nickdesaulniers nickdesaulniers self-assigned this Mar 5, 2021
@nickdesaulniers
Copy link
Member

I'm incredibly distracted right now with assembler related issues in Android, but I was able to eliminate all of these with:

diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 37363d570b9b..aef7c766f335 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -129,7 +129,7 @@ static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
 {
        instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
        return arch_test_bit(nr, addr);
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index ac398e143c9a..a34be5098bbb 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -235,7 +235,7 @@ static inline int __nodes_full(const nodemask_t *srcp, unsigned int nbits)
 }
 
 #define nodes_weight(nodemask) __nodes_weight(&(nodemask), MAX_NUMNODES)
-static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
+static __always_inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
 {
        return bitmap_weight(srcp->bits, nbits);
 }
@@ -260,13 +260,13 @@ static inline void __nodes_shift_left(nodemask_t *dstp,
           > MAX_NUMNODES, then the silly min_ts could be dropped. */
 
 #define first_node(src) __first_node(&(src))
-static inline int __first_node(const nodemask_t *srcp)
+static __always_inline int __first_node(const nodemask_t *srcp)
 {
        return min_t(int, MAX_NUMNODES, find_first_bit(srcp->bits, MAX_NUMNODES));
 }
 
 #define next_node(n, src) __next_node((n), &(src))
-static inline int __next_node(int n, const nodemask_t *srcp)
+static __always_inline int __next_node(int n, const nodemask_t *srcp)
 {
        return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
 }

that include/asm-generic/bitops/instrumented-non-atomic.h but other definitions of test_bit were not seems like a smoking gun for one particular config that seems to be related. (We should find out what that is, then see if we still get the warning for a different definition of test_bit with allyesconfig minus that config). bitmap_weight is itself __always_inline, so one-line callers of it probably should be, too.

include/linux/nodemask.h has a curious comment:

118 /*                                                                                                                                                                          
119  * The inline keyword gives the compiler room to decide to inline, or                                                                                                       
120  * not inline a function as it sees best.  However, as these functions                                                                                                      
121  * are called in both __init and non-__init functions, if they are not                                                                                                      
122  * inlined we will end up with a section mis-match error (of the type of                                                                                                    
123  * freeable items not being freed).  So we must use __always_inline here                                                                                                    
124  * to fix the problem.  If other functions in the future also end up in                                                                                                     
125  * this situation they will also need to be annotated as __always_inline                                                                                                    
126  */                                                                                                          

So it feels kind of suspicious to be kneecapping LLVM's constant propagation for a complex edge case for satisfying rules highly specific to the Linux kernel, when the kernel itself has a comment in these headers saying "use __always_inline if these are called from both __init and non-__init functions" in the same header that defines the problematic functions.

@arndb
Copy link

arndb commented Mar 25, 2021

I am a bit skeptical about whether that comment still applies though. This was added in 2013, and the compilers were quite a bit different back then. I would not be surprised if whatever bug happened back than was completely unreproducible with gcc-4.9 and higher.

The hack I have locally to build cleanly just drops the __initdata annotation on numa_nodes_parsed ;-)

Another possible hack would be to annotate amd_numa_init() as attribute((flatten))

@nickdesaulniers
Copy link
Member

That comment is also perfectly in line with Boris' feedback on get_smp_config. So I think it makes a good rule of thumb for the kernel: "if you have static inline functions called from __init and non-__init functions, you probably want __always_inline rather than inline."

If marking all callees recursively __always_inline is too painful, __attribute__((flatten)) can help, but boy it sure is an awfully big hammer. I suspect we don't need to completely flatten all callees into amd_numa_init.

I am a bit skeptical about whether that comment still applies though. This was added in 2013, and the compilers were quite a bit different back then. I would not be surprised if whatever bug happened back than was completely unreproducible with gcc-4.9 and higher.

Are you referring to a different comment in the same source file, or the one I posted? Because the comment I linked to looks totally reasonable to me, and I don't see how that would have changed in compilers since 2013.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 25, 2021

that include/asm-generic/bitops/instrumented-non-atomic.h but other definitions of test_bit were not seems like a smoking gun for one particular config that seems to be related.
We should find out what that is, then see if we still get the warning for a different definition of test_bit with allyesconfig minus that config

allyesconfig minus CONFIG_UBSAN_OBJECT_SIZE fixes most (except for test_bit), so Arnd's findings related to CONFIG_UBSAN_OBJECT_SIZE=y are definitely related with these. Perhaps a focus on addressing that in LLVM is better than modifying the constant propagation replacement to resolve this set of warnings.

I do think it would be clearer for kernel maintainers when sending them patches to specify precisely which configs trigger this (rather than use less precise language like "clang sometimes decides to not inline") though, consider:

For test_bit, there are 38 local copies made for x86 allyesconfig minus CONFIG_UBSAN_OBJECT_SIZE, they look huge due to setup and calls to __sanitizer_cov_trace_pc, __kasan_check_read, __sanitizer_cov_trace_const_cmp1, __ubsan_handle_load_invalid_value. So the response to "how could test_bit not be inlined, it's so tiny!?" is "yeah, but CONFIG_GCOV_KERNEL=y (__sanitizer_cov_trace_pc, __sanitizer_cov_trace_const_cmp1), CONFIG_KASAN=y (__kasan_check_read), and CONFIG_UBSAN_SANITIZE_ALL=y (__ubsan_handle_load_invalid_value) have conspired to mess that up." The code coverage runtime calls look the most expensive (CONFIG_KCOV=y), but it's actually all 3 that need to be disabled for test_bit to inlined into the relevant caller.

AMD_NUMA+KASAN alone (on top of defconfig) are enough to trigger the warnings from test_bit. (It's always the sanitizers).

I also wonder why these warnings get printed twice? Does modpost have a bug where it's accidentally doing double the work?

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 25, 2021

A quick hack to LLVM to run LowerConstantIntrinsicsPass before inlining (more so, an arbitrary point, but still before inlining):

EDIT this diff is now highly outdated...

diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 52f7026c40eb..a41bf6419cd4 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -666,6 +666,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROA());
+  FPM.addPass(LowerConstantIntrinsicsPass());
 
   // Catch trivial redundancies
   FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));

Fixes all of the observed warnings in Arnd's example, and the kernel except test_bit (sanitizers), but also produces tons of linkage errors related to failed compile time asserts (BUILD_BUG_ON). I'm pretty sure LowerConstantIntrinsicsPass needs to run post inlining so that __builtin_constant_p can work at all like it does in GCC; perhaps if the whole pass cannot be rerun we could create a new pass for handling @llvm.objectsize that's run before inlining (splitting the pass in two, and moving one part before inlining). (But that seems like a lot of work when 4 __always_inline would also solve the warning).

@nickdesaulniers
Copy link
Member

I happened to be reading though the source of modpost (scripts/mod/modpost.c); this is exactly Pattern 5 that is whitelisted, though it relies on GCC renaming the optimized callee.

@nickdesaulniers
Copy link
Member

commit 4a3893d ("modpost: don't emit section mismatch warnings for compiler optimizations")

is what added modpost pattern 5; I'd like to see if I can dig up the lore thread around its discussion.

@nickdesaulniers
Copy link
Member

I think we should just create a compiler specific allow list in modpost for this.

@nickdesaulniers
Copy link
Member

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 3e623ccc020b..c160f800f5e3 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -853,6 +853,27 @@ static int match(const char *sym, const char * const pat[])
        return 0;
 }
 
+struct secref_exception {
+       const char * const fromsym, * const tosym;
+};
+
+static const struct secref_exception secref_allowlist [] = {
+       { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" },
+       { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" },
+       { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" },
+       { .fromsym = "early_get_smp_config", .tosym = "x86_init" },
+       { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" },
+};
+
+static int match_allowlist(const char* fromsym, const char* tosym) {
+       int i = 0, e = sizeof(secref_allowlist) / sizeof(secref_allowlist[0]);
+       for (; i != e; ++i)
+               if (!strcmp(secref_allowlist[i].fromsym, fromsym) &&
+                   !strcmp(secref_allowlist[i].tosym, tosym))
+                       return 1;
+       return 0;
+}
+
 /* sections that we do not want to do full section mismatch check on */
 static const char *const section_white_list[] =
 {
@@ -1192,6 +1213,8 @@ static const struct sectioncheck *section_mismatch(
  *   tosec   = init section
  *   fromsec = text section
  *   refsymname = *.constprop.*
+ *   LLVM will do similar constant propagation, but it will not rename the
+ *   transformed callee.
  *
  * Pattern 6:
  *   Hide section mismatch warnings for ELF local symbols.  The goal
@@ -1235,9 +1258,10 @@ static int secref_whitelist(const struct sectioncheck *mismatch,
 
        /* Check for pattern 5 */
        if (match(fromsec, text_sections) &&
-           match(tosec, init_sections) &&
-           match(fromsym, optim_symbols))
-               return 0;
+           match(tosec, init_sections))
+               if (match(fromsym, optim_symbols) ||
+                   match_allowlist(fromsym, tosym))
+                       return 0;
 
        /* Check for pattern 6 */
        if (strstarts(fromsym, ".L"))

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jul 30, 2021

@apinski-cavium in #gcc irc helped me pinpoint some relevant GCC commits
eb598ce634300e2e6633769ffda6205406169281
036546e58ac96338a9167fb1b239670fdca99ab3
9187e02deb92bfe982c845a682468fce39c8bf9b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44324
https://gcc.gnu.org/pipermail/gcc-patches/2010-May/284756.html

@kees
Copy link

kees commented Sep 16, 2021

Is the plan to send the modpost patch or to have LLVM perform similar clone renaming?

@kees kees added the CONFIG_WERROR Has in an error with CONFIG_WERROR (all{mod,yes}config) (or emits a non-compiler warning) label Sep 17, 2021
@kees kees added Clean build Issue needs to be fixed to get a warning-free all*config build and removed CONFIG_WERROR Has in an error with CONFIG_WERROR (all{mod,yes}config) (or emits a non-compiler warning) labels Sep 22, 2021
@nickdesaulniers
Copy link
Member

Probably modpost. It's silly IMO to ask LLVM to change to some arbitrary convention that the kernel's modpost happens to rely on.

I can hack up a change quickly to LLVM, but this results in 73+ existing test failures in LLVM. I can try to minimize the checking in modpost to object files built by clang.

@nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers added the [PATCH] Rejected The submitted patch was rejected label Sep 30, 2021
@nickdesaulniers
Copy link
Member

nickdesaulniers commented Oct 6, 2021

Jotting down my research notes from today. I ran a ~4hr creduce on the defconfig issue: https://gist.github.com/nickdesaulniers/ea307c16f747938dbfad34cab6f04a72. What's interesting about this case is that the BCP representation in IR %1 = call i1 @llvm.is.constant.i64(i64 %0) will actually never be true, but we don't figure that until after inlining, but we decided during inlining that the true case was too big so we don't inline. Inline cost analysis basically does aggressive constant propagation itself in order to recommend whether inlining would be beneficial or not, and it seems to be mishandling a special case.

I have a fix in llvm/lib/Analysis/InlineCost.cpp, but this bug might actually be deeper in llvm/lib/Analysis/ConstantFolding.cpp.

Basically, in the above case, we know that i64 %0 is not Constant, but instead of folding call i1 @llvm.is.constant.i64(i64 %0) to i64 0, we bail (evaluating the whole size of both branches of the conditional, rather than aggressively folding the whole thing)! (If i64 %0 was Constant, then we would have folded call i1 @llvm.is.constant.i64(i64 %0) to i64 1 by removing the false case and further simplifying the true case).

Stated another way, as soon as we see a non llvm::Constant parameter to a function call, we pesimize the ability of the CallAnalyzer to simplify a call site. BUT! for @llvm.is.constant.i64 that's actually a special case where we can replace the call to that intrinsic function with 0. CallAnalyzer::simplifyCallSite or even ConstantFoldCall should be able to fold a call to @llvm.is.constant.i64 when the parameter isn't dependent on the parameter of the Function parent of the CallInst.

That fixes the issue observed for the defconfig, but not the allconfig. So there may be more than one bug here; perhaps object size is related for allyesconfig or has a similar issue. Or my WIP local patch may have a bug; though I don't regress any existing LLVM tests. I will clean up the patch and post it tomorrow.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Oct 6, 2021

https://reviews.llvm.org/D111272 is the first part, which fixes the unusually high cost seen in the defconfigs.

I'm now investigating the issues from the all*configs; I don't see any unusually high "costs" preventing inlining. Instead, I'm seeing negative costs in defconfigs vs just-over-threshold costs in allmodconfig. Let me try to what can cause the costs to become negative, since that's a little curious. That might indicate if something is missing in the all*configs.

Absurdly to the point of comedy, the mere act of questioning "why was this inlined or not" (-Rpass-missed=inline) changes the cost model. WTF!

nickdesaulniers added a commit to llvm/llvm-project that referenced this issue Oct 8, 2021
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
@nickdesaulniers
Copy link
Member

Before this week's changes, we had warning from:

  1. test_bit (allmodconfig)
  2. __nodes_weight (allmodconfig, defconfig)
  3. early_get_smp_config (allmodconfig)
  4. __first_node (allmodconfig)
  5. __next_node (allmodconfig)

part 1 has landed. This eliminates the issues with defconfig, but does nothing for allmodconfig.

I have some ideas for making the analysis even more precise, it just involves a lot more work at compile time. ("Will inlining change what __builtin_constant_p evaluates to?" rather than "__builtin_constant_p is either 1 or 0, but not both/a runtime call").

the second issue I see is that calls to @llvm.objectsize. (UBSAN) aren't being modeled/simplified either: https://reviews.llvm.org/D111456 (ie. no amount of inlining would change that call; it's not a runtime call).

That eliminates 2, 3, and 4 from allmodconfig.

This leaves us with 2 cases from allmodconfig test_bit and __next_node. It looks like they each have 1 caller, so they should be getting a severe discount towards inline cost, but they're not; it seems to have something to do with block frequency and the inliner thinking the BB's the calls exist in are cold. I suspect that all the handling for KASAN is messing up the block frequency calculations (like the KASAN runtime handlers having the same expected frequency although they probably should never run except in exceptional cases).

@bwendling
Copy link

Slight ping, as I'm running into this situation right now. Is there a status update on the various patches?

github-actions bot pushed a commit to tstellar/llvm-project that referenced this issue Jan 7, 2022
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272

(cherry picked from commit 9697f93)
tstellar pushed a commit to llvm/llvm-project that referenced this issue Jan 12, 2022
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272

(cherry picked from commit 9697f93)
@nickdesaulniers
Copy link
Member

note that -fsanitize=object-size was removed in
commit 69d0db0 ("ubsan: remove CONFIG_UBSAN_OBJECT_SIZE")
so after that https://reviews.llvm.org/D111456 might be less pressing.

s194604 pushed a commit to s194604/patmos-llvm-project that referenced this issue Apr 10, 2022
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272

(cherry picked from commit 9697f93)
bryanpkc pushed a commit to flang-compiler/classic-flang-llvm-project that referenced this issue Apr 20, 2022
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272

(cherry picked from commit 9697f93)
@nickdesaulniers
Copy link
Member

https://reviews.llvm.org/rG9697f93587f46300814f1c6c68af347441d6e05d is what ultimately landed in LLVM; it allowed us to get more aggressive about inlining callees containing calls to __builtin_constant_p.

https://reviews.llvm.org/D111456 (just pinged Kazu) is maybe still interesting if this comes up again, but closing this out now. We no longer have any instances of section mismatch warnings for x86_64 allmodconfig due to inlining decisions.

@nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers added [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x [PATCH] Bitrot Patch is outdated and needs to be refreshed or revisited and removed [PATCH] Submitted A patch has been submitted for review [PATCH] Rejected The submitted patch was rejected labels Sep 10, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: ClangBuiltLinux/linux#1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
nickdesaulniers added a commit to llvm/llvm-project that referenced this issue Jan 24, 2023
Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: ClangBuiltLinux/linux#1302

Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D111456
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 25, 2023
Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: ClangBuiltLinux/linux#1302

Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D111456
veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Jun 17, 2024
Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: ClangBuiltLinux/linux#1302

Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D111456
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Nov 17, 2024
Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: ClangBuiltLinux/linux#1302

Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D111456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] llvm A bug that should be fixed in upstream LLVM Clean build Issue needs to be fixed to get a warning-free all*config build [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x [PATCH] Bitrot Patch is outdated and needs to be refreshed or revisited
Projects
None yet
Development

No branches or pull requests

5 participants