-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9934a5b2ed5aa6e6bbb2e55c3cd98839722c226e breaks x86_64 ThinLTO #1440
Comments
I am not sure that bisect is right. I can boot up five times in a row at
I have reported this upstream: https://reviews.llvm.org/D106056 |
Indeed, that teaches me not to bisect things before caffeine... |
hi @samitolvanen @nathanchance After some simple digging, the log [ 0.652417] jump_label: Fatal kernel bug, unexpected op at jump_label_test+0x1d/0xe3 [ffffffffb3b89aaf] (e9 a9 00 00 00 != 0f 1f 44 00 00)) size:5 type:1 with memcmp(addr, expect, size) shows we may get wrong addr. BTW, the https://reviews.llvm.org/D106056 just try to removes unreachable default case. Do you fail to boot without ThinLTO? I'll try to reproduce this. |
@samitolvanen @nathanchance comfirmed, it shows that LLVM trunk remove some default case in cfg80211_edmg_chandef_valid, which is correct tranformation. however, some operation in tools/objtool/check.c emit "vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b", and the boot failed. |
BTW, I also find some similar to this issue, https://lore.kernel.org/lkml/20190612025227.lxumqqtqao6iqms3@treble/T/ |
I do believe this undefined behavior caused by libtool, it passed when i removed the default switch case in cfg80211_edmg_chandef_valid |
I seems that the value of chandef->edmg.bw_config is out of the range [4, 15] at runtime in https://github.com/torvalds/linux/blob/master/net/wireless/chan.c#L86. However, with thinlto, we have proved that value range is range [4, 15} and remove the default case into unreachable() which cause undefined behavior. |
diff --git a/net/wireless/chan.c b/net/wireless/chan.c
index 869c43d4414c..53e8ed341381 100644
--- a/net/wireless/chan.c
+++ b/net/wireless/chan.c
@@ -108,9 +108,6 @@ static bool cfg80211_edmg_chandef_valid(const struct cfg80211_chan_def *chandef)
if (max_contiguous < 4)
return false;
break;
-
- default:
- return false;
}
/* check bw_config against aggregated (non contiguous) edmg channels */
@@ -134,8 +131,6 @@ static bool cfg80211_edmg_chandef_valid(const struct cfg80211_chan_def *chandef)
if (num_of_enabled < 4 || max_contiguous < 2)
return false;
break;
- default:
- return false;
}
return true; |
https://reviews.llvm.org/D106056 has been reverted, FYI |
Jotting down notes; I've been looking into this yesterday evening and this afternoon. I got distracted by #1446 , but it looks like there's a patch out now for that, and I can just use GNU objdump for now. Nothing conclusive yet, but still some notes:
but doesn't touch it. (Some MBB's get renamed or moved, so the result is nearly the same):
So if we have a successor that we won't visit 100.00% because it's unreachable, shouldn't
(I guess technically that would make the |
Can you give me a standalone reproducer that only involves |
(I followed up with Roman on IRC) This looks like a follow up to https://llvm.org/pr43129; in fact, I spent all day looking at code that was added in https://reviews.llvm.org/D68131. I did come up with this test case, reduced from Nathan's reproducer: @foo = global i32 0
define i32 @baz(i32 %0) {
switch i32 %0, label %if.then.unreachabledefault [
i32 4, label %sw.epilog8
i32 5, label %sw.epilog8
i32 8, label %sw.bb2
i32 9, label %sw.bb2
i32 12, label %sw.bb4
i32 13, label %sw.bb4
]
sw.bb2:
br label %return
sw.bb4:
br label %return
sw.epilog8:
br label %return
if.then.unreachabledefault:
unreachable
return:
%retval.0 = phi i32 [ 1, %sw.epilog8 ], [ 0, %sw.bb2 ], [ 0, %sw.bb4 ]
ret i32 %retval.0
} $ llc -O2 foo.ll -o -
```asm
# ...
baz:
.cfi_startproc
xorl %eax, %eax
movl $13056, %ecx
btl %edi, %ecx
jae .LBB0_1
retq
.LBB0_1:
movl $48, %eax
btl %edi, %eax
jae .LBB0_4 # oops!
movl $1, %eax
retq
.LBB0_4:
.Lfunc_end0:
# ... cc @zmodem |
Please help test https://reviews.llvm.org/D109103 and https://reviews.llvm.org/D109106 with https://reviews.llvm.org/D106056 re-applied locally. |
I tested diff 370104 and diff 370343 of https://reviews.llvm.org/D109103 with https://reviews.llvm.org/D106056 reapplied and the kernel booted in QEMU and I did not see any |
Upload a test that shows ISEL taking a SwitchInst that has an unreachable BB for a default target being lowered to an unconditional jump off the end of a function. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Link: ClangBuiltLinux/linux#679 Link: ClangBuiltLinux/linux#1440 Reviewed By: craig.topper, hans Differential Revision: https://reviews.llvm.org/D109106
I am seeing this problem again.
I think it's caused by the particular static key in Kernel 6.0 with LLVM/Clang 15.0.2 |
Hmm, I can reproduce this problem using as far back as clang 13. I don't have older clang available from my system's package management. For now it's possible to workaround by disabling CONFIG_ZSWAP. |
Upload a test that shows ISEL taking a SwitchInst that has an unreachable BB for a default target being lowered to an unconditional jump off the end of a function. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Link: ClangBuiltLinux/linux#679 Link: ClangBuiltLinux/linux#1440 Reviewed By: craig.topper, hans Differential Revision: https://reviews.llvm.org/D109106
…n is unreachable Otherwise we end up with an extra conditional jump, following by an unconditional jump off the end of a function. ie. bb.0: BT32rr .. JCC_1 %bb.4 ... bb.1: BT32rr .. JCC_1 %bb.2 ... JMP_1 %bb.3 bb.2: ... bb.3.unreachable: bb.4: ... Should be equivalent to: bb.0: BT32rr .. JCC_1 %bb.4 ... JMP_1 %bb.2 bb.1: bb.2: ... bb.3.unreachable: bb.4: ... This can occur since at the higher level IR (Instruction) SwitchInsts are required to have BBs for default destinations, even when it can be deduced that such BBs are unreachable. For most programs, this isn't an issue, just wasted instructions since the unreachable has been statically proven. The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to boot though once D106056 is re-applied. D106056 makes it more likely that correlation-propagation (CVP) can deduce that the default case of SwitchInsts are unreachable. The x86_64 kernel uses a binary post processor called objtool, which emits this warning: vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b I haven't debugged precisely why this causes a failure at boot time, but fixing this very obvious jump off the end of the function fixes the warning and boot problem. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Fixes: ClangBuiltLinux/linux#679 Fixes: ClangBuiltLinux/linux#1440 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109103
Starting with LLVM commit 9934a5b2ed5aa6e6bbb2e55c3cd98839722c226e ("CVP] processSwitch: Remove default case when switch cover all possible values."), ToT x86_64 defconfig + ThinLTO fails to boot:
@nickdesaulniers @nathanchance
Edit: Updated to refer to the correct commit, which Nathan pointed out below, to avoid confusion.
The text was updated successfully, but these errors were encountered: