-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RISC-V boot failure after LLVM commit e87f33d9ce785668223c3bcc4e06956985cccda1 #1965
Comments
Maybe a patch to the compiler like this can give some clue
|
That does trigger it seems.
|
@nathanchance can you attach the files from the crash report? |
@topperc Here's one of the preprocessed files from the build system directly with a simplified set of flags.
|
The particular place that's failing is
We no longer emit a relocation for this jump because of the norelax. We still need to emit a relocation so the jump target can be resolved when we relax everything else. An easy fix is to OR @MaskRay what are your thoughts? |
tl;dr I suspect that the kernel makes a brittle assumption but I am not familiar with the kernel enough to find it.
The reduced example is: cat > tcp.ll <<'eof'
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"
define i32 @tcp_orphan_count_sum() {
entry:
callbr void asm sideeffect "886 :\0A.option push\0A.option norvc\0A.option norelax\0Aj ${0:l}\0A.option pop\0A887 :\0A.if 1 == 1\0A.pushsection .alternative, \22a\22\0A.4byte\09((886b) - .) \0A.4byte\09((888f) - .) \0A.2byte\090\0A.2byte 889f - 888f\0A.4byte\0930\0A.popsection\0A.subsection 1\0A888 :\0A.option push\0A.option norvc\0A.option norelax\0Anop\0A.option pop\0A889 :\0A.org\09. - (887b - 886b) + (889b - 888b)\0A.org\09. - (889b - 888b) + (887b - 886b)\0A.previous\0A.endif\0A", "!i"()
to label %do.body [label %do.body]
do.body: ; preds = %entry, %entry
ret i32 0
}
eof
/tmp/Rel/bin/llc -filetype=obj -mtriple=riscv64-linux-gnu -mattr=+relax,+c tcp.ll -o onepass.o
/tmp/Rel/bin/llc -mtriple=riscv64-linux-gnu -mattr=+relax,+c tcp.ll -o tcp.s && /tmp/Rel/bin/llvm-mc -filetype=obj -triple=riscv64-linux-gnu -mattr=+relax,+c tcp.s -o tcp.o
In Ideally Unfortunately, my attempt to force
(
) |
It is possible that there is an assumption around I will note those two commits came into the kernel relatively recently but I see boot problems in 5.10, which is quite old... I do see a |
Okay "goes haywire" may have been underselling it. Now that I have compared what happens in On the good kernel (in
On the bad kernel:
I have uploaded the object files of https://gist.github.com/nathanchance/7e22ced9a7bca1c69a372ca11f822104 |
In continuing to try and debug this further, I have noticed two things.
|
I can avoid this issue on 5.15 by disabling |
cc @hiraditya as rolling past llvm/llvm-project@e87f33d will likely break RISCV linux kernel from booting for Android. |
Confirmed, |
This isn't caused by the Linux kernel's alternatives patching, as the crash occurs prior to that. I've put together a standalone reproducer: file1.c: extern void f(void);
int main()
{
asm goto(
".option push\n"
".option norvc\n"
".option norelax\n"
"j %[label]\n"
".option pop\n" :::: label);
f();
f();
f();
f();
label:
return 0;
} file2.c: void f(void)
{
} commands: clang -target riscv64-linux-gnu -O2 -c file1.c
clang -target riscv64-linux-gnu -O2 -c file2.c
ld.lld file1.o file2.o
llvm-objdump -trd file1.o
llvm-objdump -d a.out The object file
But after linking, the jump is jumping off the end of the function:
The distance jumped is still 0x24, so apparently the bug is that when the code was shrunk by "relaxations", the jump distance was not decreased accordingly. Passing |
…Relax Regarding ``` .option norelax j label .option relax // relaxable instructions label: ``` The J instruction needs a relocation to ensure the target is correct after linker relaxation. This is related a limitation in the assembler: RISCVAsmBackend::shouldForceRelocation decides upfront whether a relocation is needed, instead of checking more information (whether there are relaxable fragments in between). Despite the limitation, `j label` produces a relocation in direct object emission mode, but was broken by llvm#73721 due to the shouldForceRelocation limitation. Add a workaround to RISCVTargetELFStreamer to emulate the previous behavior. Link: ClangBuiltLinux/linux#1965
I can confirm that llvm/llvm-project#77436 resolves this issue for me. |
…Relax (#77436) Regarding ``` .option norelax j label .option relax // relaxable instructions // For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423). // For direct object emission, ForceRelocs is not set after #73721 label: ``` The J instruction needs a relocation to ensure the target is correct after linker relaxation. This is related a limitation in the assembler: RISCVAsmBackend::shouldForceRelocation decides upfront whether a relocation is needed, instead of checking more information (whether there are relaxable fragments in between). Despite the limitation, `j label` produces a relocation in direct object emission mode, but was broken by #73721 due to the shouldForceRelocation limitation. Add a workaround to RISCVTargetELFStreamer to emulate the previous behavior. Link: ClangBuiltLinux/linux#1965
@ebiggers Thanks for the detailed reproduce, @nathanchance for testing and @topperc for approving llvm/llvm-project#77436 . This issue has been resolved by llvm/llvm-project#77436 |
…Relax (llvm#77436) Regarding ``` .option norelax j label .option relax // relaxable instructions // For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423). // For direct object emission, ForceRelocs is not set after llvm#73721 label: ``` The J instruction needs a relocation to ensure the target is correct after linker relaxation. This is related a limitation in the assembler: RISCVAsmBackend::shouldForceRelocation decides upfront whether a relocation is needed, instead of checking more information (whether there are relaxable fragments in between). Despite the limitation, `j label` produces a relocation in direct object emission mode, but was broken by llvm#73721 due to the shouldForceRelocation limitation. Add a workaround to RISCVTargetELFStreamer to emulate the previous behavior. Link: ClangBuiltLinux/linux#1965
After llvm/llvm-project@e87f33d, I see a boot failure with
ARCH=riscv defconfig
:That commit seems rather innocuous, so it seems likely that this has just exposed some other issue related to linker relaxation because if I apply the following diff that adds
-mno-relax
toKBUILD_CFLAGS
andKBUILD_AFLAGS
, the kernel boots fine.cc @topperc @MaskRay just in case you have any immediate ideas of where I should look to see what is going on here.
Bisect log
The text was updated successfully, but these errors were encountered: