Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with volatile? #1566

Closed
nickdesaulniers opened this issue Jan 12, 2022 · 10 comments
Closed

issue with volatile? #1566

nickdesaulniers opened this issue Jan 12, 2022 · 10 comments
Assignees
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Submitted A patch has been submitted for review [TOOL] objtool warning is produced by the kernel's objtool

Comments

@nickdesaulniers
Copy link
Member

Boris cc'ed us on an x86 change (now in mainline as dcce50e) from @jpoimboe. The commit message sounds ominous about volatile qualified asm.

It mentions CONFIG_TRACE_BRANCH_PROFILING. If I revert dcce50e though, defconfig+CONFIG_TRACE_BRANCH_PROFILING=y isn't enough to repro for me in ToT clang (clang-14) or clang-11.

Looking at the archives for handle_xfd_event, it looks like there's 2 0day bot reports. Both are randconfig builds. If I take the first config, then revert dcce50e, I can reproduce the flood of objtool warnings.

I didn't make much progress today, but I'm curious:

  1. what set of configs does this take to go from defconfig to these warnings? (kconfig reduction)
  2. what changes in dcce50e affect the actual codegen?
@nickdesaulniers nickdesaulniers added [BUG] Untriaged Something isn't working [ARCH] x86_64 This bug impacts ARCH=x86_64 [TOOL] objtool warning is produced by the kernel's objtool labels Jan 12, 2022
@nathanchance
Copy link
Member

I'll see if I can reduce the set of Kconfig changes on top of defconfig with config-bisect.pl.

@nathanchance
Copy link
Member

$ make -skj"$(nproc)" LLVM=1 defconfig

$ scripts/config -d BRANCH_PROFILE_NONE -e PROFILE_ANNOTATED_BRANCHES -d GENERIC_CPU -e MPSC

$ make -skj"$(nproc)" LLVM=1 olddefconfig arch/x86/kernel/traps.o
arch/x86/kernel/traps.o: warning: objtool: handle_xfd_event()+0xb0: unreachable instruction

$ git diff --no-index ../defconfig .config
diff --git a/../defconfig b/.config
index 7763c13a14f8..d650c3511a9e 100644
--- a/../defconfig
+++ b/.config
@@ -331,12 +331,13 @@ CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
 CONFIG_SCHED_OMIT_FRAME_POINTER=y
 # CONFIG_HYPERVISOR_GUEST is not set
 # CONFIG_MK8 is not set
-# CONFIG_MPSC is not set
+CONFIG_MPSC=y
 # CONFIG_MCORE2 is not set
 # CONFIG_MATOM is not set
-CONFIG_GENERIC_CPU=y
-CONFIG_X86_INTERNODE_CACHE_SHIFT=6
-CONFIG_X86_L1_CACHE_SHIFT=6
+# CONFIG_GENERIC_CPU is not set
+CONFIG_X86_INTERNODE_CACHE_SHIFT=7
+CONFIG_X86_L1_CACHE_SHIFT=7
+CONFIG_X86_P6_NOP=y
 CONFIG_X86_TSC=y
 CONFIG_X86_CMPXCHG64=y
 CONFIG_X86_CMOV=y
@@ -4728,9 +4729,11 @@ CONFIG_FTRACE=y
 # CONFIG_MMIOTRACE is not set
 # CONFIG_FTRACE_SYSCALLS is not set
 # CONFIG_TRACER_SNAPSHOT is not set
-CONFIG_BRANCH_PROFILE_NONE=y
-# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
+CONFIG_TRACE_BRANCH_PROFILING=y
+# CONFIG_BRANCH_PROFILE_NONE is not set
+CONFIG_PROFILE_ANNOTATED_BRANCHES=y
 # CONFIG_PROFILE_ALL_BRANCHES is not set
+# CONFIG_BRANCH_TRACER is not set
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_KPROBE_EVENTS=y
 CONFIG_UPROBE_EVENTS=y

@nathanchance
Copy link
Member

When run through cvise:

$ cat arch/x86/kernel/traps.i
handle_xfd_event() {
  asm(".byte 15, 0x0b" : : "i"(sizeof(int)));
  asm("259"
      ":\n\t"
      ".pushsection .discard.reachable\n\t"
      ".long "
      "259"
      "b - .\n\t"
      ".popsection\n\t");
  arch_local_irq_disable();
}

$ clang -O2 -c -o traps.{o,i}
...

$ ./objtool orc generate --no-fp --uaccess traps.o

$ clang -O2 -march=nocona -c -o traps.{o,i}
...

$ ./objtool orc generate --no-fp --uaccess traps.o
traps.o: warning: objtool: handle_xfd_event()+0x2: unreachable instruction

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 13, 2022

When run through cvise:

The disassembly is literally the same between -march=nocona (and not) for .text. For .discard.reachable, it looks like the relocation differs.

$ clang -O2 -c traps.i -w
$ llvm-objdump -drj .discard.reachable traps.o
traps.o:        file format elf64-x86-64

Disassembly of section .discard.reachable:

0000000000000000 <.discard.reachable>:
       0: 00 00                         addb    %al, (%rax)
                0000000000000000:  R_X86_64_PC32        .text+0x2
       2: 00 00                         addb    %al, (%rax)
$ clang -O2 -c traps.i -w -march=nocona
$ llvm-objdump -drj .discard.reachable traps.o

traps.o:        file format elf64-x86-64

Disassembly of section .discard.reachable:

0000000000000000 <.discard.reachable>:
       0: 00 00                         addb    %al, (%rax)
                0000000000000000:  R_X86_64_PC32        .text+0x4
       2: 00 00                         addb    %al, (%rax)
$ llvm-objdump -dr traps.o                    

traps.o:        file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <handle_xfd_event>:
       0: 0f 0b                         ud2
       2: 31 c0                         xorl    %eax, %eax
       4: e9 00 00 00 00                jmp     0x9 <handle_xfd_event+0x9>
                0000000000000005:  R_X86_64_PLT32       arch_local_irq_disable-0x4

Note: gcc generates the former, not the latter.

Dumping the LLVM IR from either, then running that through llc -O2, I can see the generated assembler moves xorl above the second asm block for the nocona case. Adding -print-after-all to the llc invocation of the nocona case, I see that post-RA-sched is moving the xor above the asm statements. There's no difference in MIR before post-RA-sched between the two.

So having a schedule, we seem to be moving the setting of return value above the second asm statement.

So I think there was a mixup between volatile vs a compiler barrier. A memory clobber is a compiler barrier that prevents reordering around an asm statement, not volatile.

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 429dcebe2b99..af287f92550f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -118,18 +118,18 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 #define __stringify_label(n) #n
 
 #define __annotate_reachable(c) ({                                     \
-       asm volatile(__stringify_label(c) ":\n\t"                       \
+       asm (__stringify_label(c) ":\n\t"                               \
                     ".pushsection .discard.reachable\n\t"              \
                     ".long " __stringify_label(c) "b - .\n\t"          \
-                    ".popsection\n\t" : : "i" (c));                    \
+                    ".popsection\n\t" ::: "memory");                   \
 })
 #define annotate_reachable() __annotate_reachable(__COUNTER__)
 
 #define __annotate_unreachable(c) ({                                   \
-       asm volatile(__stringify_label(c) ":\n\t"                       \
+       asm (__stringify_label(c) ":\n\t"                               \
                     ".pushsection .discard.unreachable\n\t"            \
                     ".long " __stringify_label(c) "b - .\n\t"          \
-                    ".popsection\n\t" : : "i" (c));                    \
+                    ".popsection\n\t" ::: "memory");                   \
 })
 #define annotate_unreachable() __annotate_unreachable(__COUNTER__)

Though reading the commit message of d0c2e69, it sounds like volatile may still be needed to avoid de-duplication of asm? The gcc manual mentions that this is what the %= "assembler template" is for. In that case, I might go so far as to change this to:

commit 36cd505d1917eca7d56bb2382b306556dcb9e192 (HEAD -> master)
Author: Nick Desaulniers <ndesaulniers@google.com>
Date:   Wed Jan 12 16:16:33 2022 -0800

    compiler.h: prefer memory clobber & %= to volatile & __COUNTER__
    
    commit dcce50e6cc4d ("compiler.h: Fix annotation macro misplacement with Clang")
    mentions:
    
    > 'volatile' is ignored for some reason and Clang feels free to move the
    > reachable() annotation away from its intended location.
    
    Indeed, volatile is not a compiler barrier. Particularly once `-march=`
    flags are used under certain configs, LLVM's machine schedule can be
    observed moving instructions across the asm statement meant to point to
    known reachable or unreachable code, as reported by 0day bot.
    
    Prefer a memory clobber which is a compiler barrier that prevents these
    re-orderings and remove the volatile qualifier.
    
    Looking closer, the use of __COUNTER__ seems to have been used to
    prevent de-duplication of these asm statements. The GCC manual mentions:
    
    > Under certain circumstances, GCC may duplicate (or remove duplicates
    > of) your assembly code when optimizing. This can lead to unexpected
    > duplicate symbol errors during compilation if your asm code defines
    > symbols or labels. Using ‘%=’ (see AssemblerTemplate) may help resolve
    > this problem.
    >
    > ‘%=’ Outputs a number that is unique to each instance of the asm
    > statement in the entire compilation. This option is useful when
    > creating local labels and referring to them multiple times in a single
    > template that generates multiple assembler instructions.
    
    commit 3d1e236022cc ("objtool: Prevent GCC from merging annotate_unreachable()")
    
    Mentions that
    
    > The inline asm ‘%=’ token could be used for that, but unfortunately
    > older versions of GCC don't support it.
    
    From testing all versions of GCC available on godbolt.org, GCC 4.1+
    seems to support 4.1. Since the minimum supported version of GCC at the
    moment is GCC 5.1, it sounds like this is no longer a concern.
    
    Prefer the %= assembler template to having to stringify __COUNTER__.
    
    This commit is effectively a revert of the following commits:
    commit dcce50e6cc4d ("compiler.h: Fix annotation macro misplacement with Clang")
    commit f1069a8756b9 ("compiler.h: Avoid using inline asm operand modifiers")
    commit d0c2e691d1cb ("objtool: Add a comment for the unreachable annotation macros")
    commit ec1e1b610917 ("objtool: Prevent GCC from merging annotate_unreachable(), take 2")
    commit 3d1e236022cc ("objtool: Prevent GCC from merging annotate_unreachable()")
    
    Link: https://github.com/ClangBuiltLinux/linux/issues/1566
    Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
    Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#AssemblerTemplate
    Link: https://lore.kernel.org/llvm/202112080834.XFYU8b5Q-lkp@intel.com/
    Link: https://lore.kernel.org/llvm/202111300857.IyINAyJk-lkp@intel.com/
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 429dcebe2b99..b87f841d0c10 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -108,30 +108,19 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 # define barrier_before_unreachable() do { } while (0)
 #endif
 
-/* Unreachable code */
+/* These macros help objtool understand GCC code flow for unreachable code. */ 
 #ifdef CONFIG_STACK_VALIDATION
-/*
- * These macros help objtool understand GCC code flow for unreachable code.
- * The __COUNTER__ based labels are a hack to make each instance of the macros
- * unique, to convince GCC not to merge duplicate inline asm statements.
- */
-#define __stringify_label(n) #n
-
-#define __annotate_reachable(c) ({                                     \
-       asm volatile(__stringify_label(c) ":\n\t"                       \
-                    ".pushsection .discard.reachable\n\t"              \
-                    ".long " __stringify_label(c) "b - .\n\t"          \
-                    ".popsection\n\t" : : "i" (c));                    \
-})
-#define annotate_reachable() __annotate_reachable(__COUNTER__)
-
-#define __annotate_unreachable(c) ({                                   \
-       asm volatile(__stringify_label(c) ":\n\t"                       \
-                    ".pushsection .discard.unreachable\n\t"            \
-                    ".long " __stringify_label(c) "b - .\n\t"          \
-                    ".popsection\n\t" : : "i" (c));                    \
-})
-#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
+#define annotate_reachable() \
+       asm (".Lreachable%=:\n\t"                       \
+            ".pushsection .discard.reachable\n\t"      \
+            ".long .Lreachable%= - .\n\t"              \
+            ".popsection\n\t" ::: "memory");
+
+#define annotate_unreachable()                         \
+       asm (".Lunreachable%=:\n\t"                     \
+            ".pushsection .discard.unreachable\n\t"    \
+            ".long .Lunreachable%= - .\n\t"            \
+            ".popsection\n\t" ::: "memory");
 
 #define ASM_UNREACHABLE                                                        \
        "999:\n\t"                                                      \

I will test this more before sending.

I probably want to do this to include/linux/instrumentation.h, too. ie revert of c199f64 and friends.

@nickdesaulniers nickdesaulniers added [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Exists There is a patch that fixes this issue and removed [BUG] Untriaged Something isn't working labels Jan 13, 2022
@nickdesaulniers nickdesaulniers self-assigned this Jan 13, 2022
@nickdesaulniers
Copy link
Member Author

@nickdesaulniers nickdesaulniers added [PATCH] Submitted A patch has been submitted for review and removed [PATCH] Exists There is a patch that fixes this issue labels Jan 14, 2022
@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 14, 2022

thinking about this more, this isn't the only place where the kernel relies on the inline asm having a label refer to C statements following the inline asm. Pretty sure the exception fixup stuff works this way, too. Hmm.

EDIT: at least that uses asm goto to actually inform the compiler about possible control flow. Here we have a label in inline asm hoping to refer to C code after it, but compiler CAN reorder instructions across asm statements as long as the expressed constraints are satisfied.

@nickdesaulniers
Copy link
Member Author

filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104236 for %= in gcc

@apinski-cavium
Copy link

There is no GCC bug here, rather Linux inline-asm is again broken even by the documentation which has not changed in years.

@nickdesaulniers
Copy link
Member Author

@nickdesaulniers
Copy link
Member Author

I'm not ruling out the presence of compiler bugs, but there is no bug in volatile in clang as reported in dcce50e AFAICT. @nathanchance cites the relevant passage from GCC's documentation:

Note that the compiler can move even volatile asm instructions relative
to other code, including across jump instructions.

I sent a patch to diagnose when volatile doesn't change the semantics of an asm statement, but I don't think it will land, and I don't feel strongly enough about it to push on it.

I started a thread on adding some kernel documentation; it's not been shown that there's a compiler barrier that can prevent all such instruction reordering. I'm not sure yet personally whether instrumentation_{begin|end} even care about instructions that form call-parameter setup or simply just call instructions. I've met with @jyknight and been on an email thread with @isanbard and @efriedma_quic; it doesn't sound like we have such an expression for such a barrier in LLVM IR today.

@nathanchance made the comment on IRC:

nathanchance: kees: It would be nice if there was a standard comment tag for compiler workarounds so that they could be easily eliminated when updating to the minimum version
ndesaulniers: nathanchance: last night I was thinking "I should have a talk at plumbers about how toolchain issues should be handled, that involves at the minimum a comment to a link to a bug report that's been filed, and if possible updating the comment to denote what version of a compiler was fixed"
nathanchance: ndesaulniers: Getting that into the kernel documentation would be nice :)

So I think I should still pursue the documentation patches, and there's perhaps more cleanup or refactoring work to be done, but I'm going to close this out. Happy to continue the discussion here though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Submitted A patch has been submitted for review [TOOL] objtool warning is produced by the kernel's objtool
Projects
None yet
Development

No branches or pull requests

3 participants