implement a ubsan runtime for better error messages #22488

Rexicon226 · 2025-01-14T05:32:02Z

So far, I've opted to copy the error messages that LLVM's ubsan runtime uses verbatim. I think they make sense, but we might be able to improve some of them.

As discussed on the Zulip, we've opted for the following strategies.

Regarding the logic for including the runtime, since ubsan-rt pulls in a lot of the standard library for the stack traces, I've set it up to do the following:

zig build-exe foo.c -> build libubsan.a and links
zig build-obj foo.c -> doesn't build anything, just emits references to ubsan runtime
zig build-lib foo.c -static -> build ubsan.o and link it
zig build-exe foo.zig bar.c -> import ubsan-rt into the ZCU
zig build-obj foo.zig bar.c -> import ubsan-rt into the ZCU
zig build-lib foo.zig bar.c -> import ubsan-rt into the ZCU

Regarding what defaults to use for including ubsan:

In Debug mode, the full runtime is included, with only a couple of specific checks disabled for reasons explained in Compilation.zig
In ReleaseSafe mode, the full runtime is disabled opting instead to trap on detected UB.
In other modes, ReleaseFast and ReleaseSmall, the ubsan runtime is not included in any capacity.

@mlugg has a proposal for allowing enabling other configurations of the ubsan, maybe he'll open it up as a proposal :)

src/Compilation.zig

test/link/elf.zig

andrewrk · 2025-02-24T04:29:31Z

Good work, this is quite mergeable as-is. I made a couple minor adjustments but this is ready to land.

Rexicon226 · 2025-02-24T04:29:58Z

Thank you ❤️

alexrp · 2025-02-24T05:51:40Z

lib/std/heap.zig

-    .freestanding => switch (builtin.cpu.arch) {
+    .freestanding, .other => switch (builtin.cpu.arch) {
        .wasm32, .wasm64 => 64 << 10,
+        .x86, .x86_64 => 4 << 10,
+        .aarch64, .aarch64_be => 4 << 10,
        else => null,
    },


Can you elaborate on your thinking here? freestanding means there is no OS so there is no guarantee that the concept of a "page" is even meaningful. For all the standard library knows, paging might not be set up at all. So I don't see how this change is correct.

For other there's a similar problem: The implication is that there is an OS (as opposed to freestanding), but we still don't know what that OS is, so we can't make any valid claims about its minimum page size.

My understanding is that the page size is a CPU architecture feature - the OS has a limited set of choices available based on the CPU architecture. The OS can't pick any arbitrary value for page size. Do you have reason to believe this is not the case?

The interpretation of "minimum and maximum page size" that we're using here is basically "the minimum and maximum granularity that mmap can be configured to operate at" (e.g. at kernel build time). We chose this interpretation specifically because if we just went by the CPU architecture's minimum/maximum, we would get into absurd territory quickly; most CPU architectures support page sizes that are completely impractical as the normal page size of a general-purpose OS, and so these page sizes are either not supported at all by OSs, or only supported with special opt-in flags (e.g. MAP_HUGE*).

Obviously this interpretation is not very meaningful in freestanding because for all we know there's no mmap syscall/function at all. But that's fine, because for all we know there may not be any paging at all anyway. And that implies that the standard library must not exploit knowledge about the supposed safety of overlong reads within a page, for example. So for freestanding specifically, it just doesn't really make sense to speak of page size. If there actually is one in effect, the programmer needs to tell us about it. Otherwise we're making a potentially invalid assumption.

For other, the situation is similar. We know there's an OS, but not much else; there's a good chance that it does have mmap of some description, but it could certainly also be an OS without paging. We just don't know. (This makes me think that there might not be as much utility in the other OS tag as was originally thought, but I digress.)

(I suppose you could also argue that it's possible for an OS to have mmap operate at a granularity that has nothing to do with the page sizes supported by the CPU architecture. But I don't know why you would do something that silly, and I don't think anyone has, so this probably need not be a consideration.)

(I suppose you could also argue that it's possible for an OS to have mmap operate at a granularity that has nothing to do with the page sizes supported by the CPU architecture. But I don't know why you would do something that silly, and I don't think anyone has, so this probably need not be a consideration.)

I'm arguing precisely the opposite: it's not possible for an OS to have mmap operate at a granularity different than the page sizes supported by the CPU architecture, which means that page_size_min being 4096 on x86_64 is correct, including on freestanding.

Let's make room for this PR to land, and then hash this out separately. std.mem.indexOfPos is an important function, so we need to give it the appropriate amount of engineering effort it deserves. That means even though I'm tempted to simply say, "this violates pointer provenance, fuck it," I still need to consider whether a page-aware implementation should be considered. It absolutely needs to work on freestanding by default however.

@tiehuis, if available, should be involved in that discussion.

I'm arguing precisely the opposite: it's not possible for an OS to have mmap operate at a granularity different than the page sizes supported by the CPU architecture, which means that page_size_min being 4096 on x86_64 is correct, including on freestanding.

Just to be clear, what I was getting at with that paragraph is that you can do some incredibly silly stuff with page protection and instruction emulation/resumption at the supervisor level to make it appear to userspace as if the machine has page sizes that don't look anything like the architectural page sizes. It'd be very slow and very stupid, which is why I also said it probably shouldn't be a real consideration.

Let's make room for this PR to land, and then hash this out separately.

👍 But just out of curiosity, why were these changes needed in this PR? That's not clear from the commit messages.

It absolutely needs to work on freestanding by default however.

Agree; I just think the answer here is to avoid the page-size-dependent code paths altogether in the freestanding case.

👍 But just out of curiosity, why were these changes needed in this PR? That's not clear from the commit messages.

Ah, sorry I thought you saw - ubsan_rt depends on std.mem.indexOfSentinel, which was calling std.heap.pageSize(). When that function was implemented, it was accessing a comptime value; in the runtime page size changeset, it was carelessly updated to be a runtime-known value, regressing the function on freestanding/other targets.

https://github.com/ziglang/zig/actions/runs/13497343926/job/37707452929

👍

When that function was implemented, it was accessing a comptime value; in the runtime page size changeset, it was carelessly updated to be a runtime-known value, regressing the function on freestanding/other targets.

Wait, was there ever a fully comptime pageSize() function? I thought it was just a page_size constant before. FWIW, it's intentional that pageSize() is potentially runtime: The actual page size in effect can be different (and often is, e.g. on aarch64) from the page_size_min/page_size_max constants. The latter only tell you the bounds of possible values for pageSize().

Sorry, by "that function", I meant std.mem.indexOfSentinel.

lib/ubsan_rt.zig

test/link/elf.zig

also seems to work around aarch64 LLVM miscompilation 🤔

Unlike `compiler-rt`, `ubsan` uses the standard library quite a lot. Using a similar approach to how `compiler-rt` is handled today, where it's compiled into its own object and then linked would be sub-optimal as we'd be introducing a lot of code bloat. This approach always "imports" `ubsan` if the ZCU, if it exists. If it doesn't such as the case where we're compiling only C code, then we have no choice other than to compile it down to an object and link. There's still a tiny optimization we can do in that case, which is when compiling to a static library, there's no need to construct an archive with a single object. We'd only go back and parse out ubsan from the archive later in the pipeline. So we compile it to an object instead and link that to the static library. TLDR; - `zig build-exe foo.c` -> build `libubsan.a` and links - `zig build-obj foo.c` -> doesn't build anything, just emits references to ubsan runtime - `zig build-lib foo.c -static` -> build `ubsan.o` and link it - `zig build-exe foo.zig bar.c` -> import `ubsan-rt` into the ZCU - `zig build-obj foo.zig bar.c` -> import `ubsan-rt` into the ZCU - `zig build-lib foo.zig bar.c` -> import `ubsan-rt` into the ZCU

Problem here is if zig is asked to create multiple static libraries, it will build the runtime multiple times and then they will conflict. Instead we want to build the runtime exactly once.

simply use page_size_min instead. better yet, this logic would avoid depending on page size entirely...

x86_64 and aarch64 have safe values for page_size_min

Rexicon226 force-pushed the ubsan-rt branch 4 times, most recently from 6c7f099 to 19d1ec1 Compare January 16, 2025 17:23

Rexicon226 marked this pull request as draft January 17, 2025 04:26

Rexicon226 force-pushed the ubsan-rt branch 5 times, most recently from 08295dd to 22a087e Compare January 27, 2025 20:16

Rexicon226 marked this pull request as ready for review January 28, 2025 04:24

alexrp reviewed Jan 29, 2025

View reviewed changes

src/Compilation.zig Show resolved Hide resolved

alexrp reviewed Jan 29, 2025

View reviewed changes

test/link/elf.zig Show resolved Hide resolved

alexrp added this to the 0.14.0 milestone Jan 29, 2025

Rexicon226 force-pushed the ubsan-rt branch 5 times, most recently from a1292f7 to 8da1898 Compare February 6, 2025 06:01

andrewrk force-pushed the ubsan-rt branch from 8da1898 to 04e42a7 Compare February 24, 2025 04:28

andrewrk enabled auto-merge February 24, 2025 04:29

andrewrk added the release notes This PR should be mentioned in the release notes. label Feb 24, 2025

alexrp reviewed Feb 24, 2025

View reviewed changes

linusg reviewed Feb 24, 2025

View reviewed changes

lib/ubsan_rt.zig Show resolved Hide resolved

lib/ubsan_rt.zig Outdated Show resolved Hide resolved

lib/ubsan_rt.zig Outdated Show resolved Hide resolved

test/link/elf.zig Outdated Show resolved Hide resolved

andrewrk disabled auto-merge February 24, 2025 10:27

andrewrk force-pushed the ubsan-rt branch from 5cbecc7 to c793033 Compare February 24, 2025 20:35

Rexicon226 force-pushed the ubsan-rt branch from c793033 to d3863fe Compare February 24, 2025 20:41

andrewrk enabled auto-merge February 24, 2025 21:00

Rexicon226 and others added 20 commits February 25, 2025 11:22

Compilation: use the minimal runtime in ReleaseSafe

c27b797

ubsan: implement some more checks

babee5f

move libubsan to lib/ and integrate it into -fubsan-rt

95720f0

mem: add @branchHint to indexOfSentinel

fc77678

also seems to work around aarch64 LLVM miscompilation 🤔

ubsan: implement more checks

590c613

ubsan: extend ptr before adding to avoid overflow

658fba9

ubsan: clean-up and remove the unused handlers

50b9556

ubsan: resolve the last of the TODOs

a468929

main: add -f{no-}ubsan-rt to the usage text

1417847

ubsan: clean-up a bit more

d669b95

build: add bundle_ubsan_rt

9432a9b

Compilation: correct when to include ubsan

9311784

correct some bugs

35b9db3

build: add comments explaining why we disable ubsan

44d3b5a

ubsan: avoid depending on @returnAddress combined with inline

d4413e3

ubsan: don't create ubsan in every static lib by default

e18c7f9

Problem here is if zig is asked to create multiple static libraries, it will build the runtime multiple times and then they will conflict. Instead we want to build the runtime exactly once.

std.mem.indexOfSentinel: don't ask the OS the page size

faf256e

simply use page_size_min instead. better yet, this logic would avoid depending on page size entirely...

std.heap.page_size_min: relax freestanding restriction

2447b87

x86_64 and aarch64 have safe values for page_size_min

ubsan: update wording

ca83f52

auto-merge was automatically disabled February 25, 2025 19:22
Head branch was pushed to by a user without write access

Rexicon226 force-pushed the ubsan-rt branch from 88eee58 to ca83f52 Compare February 25, 2025 19:22

andrewrk enabled auto-merge February 25, 2025 19:23

andrewrk disabled auto-merge February 26, 2025 08:08

andrewrk merged commit c45dcd0 into ziglang:master Feb 26, 2025
9 checks passed

This comment was marked as resolved.

Sign in to view

Rexicon226 mentioned this pull request Feb 27, 2025

self-hosted wasm linker bug #23025

Open

alexrp mentioned this pull request Mar 2, 2025

link.MachO.UnwindInfo: Handle u24 overflow for CU records pointing to DWARF. #23011

Merged

rpkak mentioned this pull request Mar 2, 2025

Mitigation of #15108 in std.heap.page_size_min #23043

Merged

Rexicon226 deleted the ubsan-rt branch March 4, 2025 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement a ubsan runtime for better error messages #22488

implement a ubsan runtime for better error messages #22488

Rexicon226 commented Jan 14, 2025

andrewrk commented Feb 24, 2025

Rexicon226 commented Feb 24, 2025

alexrp Feb 24, 2025

andrewrk Feb 24, 2025

alexrp Feb 24, 2025

andrewrk Feb 24, 2025

andrewrk Feb 24, 2025 •

edited

Loading

alexrp Feb 25, 2025

andrewrk Feb 25, 2025

alexrp Feb 25, 2025

andrewrk Feb 25, 2025

This comment was marked as resolved.

implement a ubsan runtime for better error messages #22488

implement a ubsan runtime for better error messages #22488

Conversation

Rexicon226 commented Jan 14, 2025

andrewrk commented Feb 24, 2025

Rexicon226 commented Feb 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewrk Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

andrewrk Feb 24, 2025 •

edited

Loading