-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler code model for kernel #275
Comments
@pcc might be interested in it. |
|
Sorry, @ardbiesheuvel , all of my links for this look irrelevant. Do you have a lore link or something written up for your idea for a kernel code model for aarch64? I think some of the aggressive optimizations @MaskRay has been working on for x86 might play in with some of your ideas for aarch64. |
I don't have any links at hand, but I can provide some background. This issue came up when I discussed the assumption in the Linux/arm64 build system that AArch64 code generated by GCC without the -fpic or -fpie flags set is suitable for linking with -pie, so that we can emit dynamic relocations into the bare metal binary, which it can use to self relocate at boot, for KASLR. Ramana (who is [still] at ARM but no longer works on GCC so I won't pull him into this discussion) pointed out that this is risky, and it would be better to generate -fpic code. However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary. This means that emitting GOT indirections is pointless, but inhibiting that with -fpic is cumbersome: -fvisibility=hidden only affects definitions not declarations, and the visibility pragma (which does affect declarations too) can only be emitted via a .h file, which needs to be pulled in using -include etc etc So this is when we first discussed introducing -mcmodel=kernel for AArch64, which could imply whichever internal options we need to get small model code but without all the GOT and .so stuff. |
For a STB_GLOBAL/STB_WEAK symbol, STV_DEFAULT: both compiler & linker need to assume such symbols can be preempted in -fpic mode. The compiler emits GOT indirection by default. STV_PROTECTED: GCC -fpic uses GOT indirection for data symbols, regardless of defined or undefined. This pessimization is to make a misfeature "copy relocation on protected data symbol" work (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses). Clang code generation treats STV_PROTECTED the same way as STV_HIDDEN. STV_HIDDEN: non-preemptible, regardless of defined or undefined. The compiler suppresses GOT indirection, unless undefined STB_WEAK. For defined symbols, -fno-pic/-fpie can avoid GOT indirection for STV_DEFAULT (and GCC STV_PROTECTED). For undefined symbols, -fpie/-fpic use GOT indirection by default. Clang -fno-direct-access-external-data (discussed in my article) can avoid GOT indirection. If you -fpic -fno-direct-access-external-data & ld -shared, you'll need additional linker options to make the linker know defined non-STB_LOCAL STV_DEFAULT symbols are non-preemptible.
The use case is similar to a userspace static no-pie executable (-fno-pic -no-pie) or static pie (-fpie -pie).
Why is -fpie risky? |
Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):
It seems to also produce an excessive growth in the number of relocations in debug info sections, which accounts for a significant growth in size of the binary (when debug info is not stripped or produced separately). The change in file size of vmlinux from enabling CONFIG_RELOCATABLE can be ~95% attributed to growth in .rela.debug_* sections, at least on x86 and DWARFv4.
Does
Right, hence
I understand; does this result in sub optimal code gen, in your experience? Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose). Do Clang and GCC both not implement |
I believe this is the link in the first comment: Here are lore links for all of the other posts: https://lore.kernel.org/r/CAKv+Gu_tuYcikQ07QKP-N+rd+DpoucSYn6TG+OJ-jm9CVGaDxg@mail.gmail.com |
...
I don't see a difference with -fno-semantic-interposition, either on GCC or Clang. In both cases, a reference to an undefined symbol is emitted using an entry in the GOT.
Not to my knowledge, no.
Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.
The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker. Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)
Both accept it for Aarch64 targets but I don't see any difference in the generated code. |
My previous comment mentioned the semantics.
No, as my previous comment mentioned. |
Is
Right, if a compiler uses absolute references for jump tables when compiling as |
The opposite
(Fixing a typo: |
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 -j72 KCFLAGS=-fdirect-access-external-data built, booted (in QEMU), and no one died (this time)(I think). Checking the object files' relocations, which undefined symbols use relocations that reference the GOT? This is on the caller's side that |
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/613617.htmlUpdated links: #275 (comment)
The text was updated successfully, but these errors were encountered: