Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement a ubsan runtime for better error messages #22488

Merged
merged 22 commits into from
Feb 26, 2025

Conversation

Rexicon226
Copy link
Contributor

closes #5163

So far, I've opted to copy the error messages that LLVM's ubsan runtime uses verbatim. I think they make sense, but we might be able to improve some of them.

As discussed on the Zulip, we've opted for the following strategies.

Regarding the logic for including the runtime, since ubsan-rt pulls in a lot of the standard library for the stack traces, I've set it up to do the following:

  • zig build-exe foo.c -> build libubsan.a and links
  • zig build-obj foo.c -> doesn't build anything, just emits references to ubsan runtime
  • zig build-lib foo.c -static -> build ubsan.o and link it
  • zig build-exe foo.zig bar.c -> import ubsan-rt into the ZCU
  • zig build-obj foo.zig bar.c -> import ubsan-rt into the ZCU
  • zig build-lib foo.zig bar.c -> import ubsan-rt into the ZCU

Regarding what defaults to use for including ubsan:

  • In Debug mode, the full runtime is included, with only a couple of specific checks disabled for reasons explained in Compilation.zig
  • In ReleaseSafe mode, the full runtime is disabled opting instead to trap on detected UB.
  • In other modes, ReleaseFast and ReleaseSmall, the ubsan runtime is not included in any capacity.

@mlugg has a proposal for allowing enabling other configurations of the ubsan, maybe he'll open it up as a proposal :)

@Rexicon226 Rexicon226 force-pushed the ubsan-rt branch 4 times, most recently from 6c7f099 to 19d1ec1 Compare January 16, 2025 17:23
@Rexicon226 Rexicon226 marked this pull request as draft January 17, 2025 04:26
@Rexicon226 Rexicon226 force-pushed the ubsan-rt branch 5 times, most recently from 08295dd to 22a087e Compare January 27, 2025 20:16
@Rexicon226 Rexicon226 marked this pull request as ready for review January 28, 2025 04:24
@alexrp alexrp added this to the 0.14.0 milestone Jan 29, 2025
@Rexicon226 Rexicon226 force-pushed the ubsan-rt branch 5 times, most recently from a1292f7 to 8da1898 Compare February 6, 2025 06:01
@andrewrk
Copy link
Member

Good work, this is quite mergeable as-is. I made a couple minor adjustments but this is ready to land.

@andrewrk andrewrk enabled auto-merge February 24, 2025 04:29
@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Feb 24, 2025
@Rexicon226
Copy link
Contributor Author

Thank you ❤️

Comment on lines -834 to 837
.freestanding => switch (builtin.cpu.arch) {
.freestanding, .other => switch (builtin.cpu.arch) {
.wasm32, .wasm64 => 64 << 10,
.x86, .x86_64 => 4 << 10,
.aarch64, .aarch64_be => 4 << 10,
else => null,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on your thinking here? freestanding means there is no OS so there is no guarantee that the concept of a "page" is even meaningful. For all the standard library knows, paging might not be set up at all. So I don't see how this change is correct.

For other there's a similar problem: The implication is that there is an OS (as opposed to freestanding), but we still don't know what that OS is, so we can't make any valid claims about its minimum page size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the page size is a CPU architecture feature - the OS has a limited set of choices available based on the CPU architecture. The OS can't pick any arbitrary value for page size. Do you have reason to believe this is not the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpretation of "minimum and maximum page size" that we're using here is basically "the minimum and maximum granularity that mmap can be configured to operate at" (e.g. at kernel build time). We chose this interpretation specifically because if we just went by the CPU architecture's minimum/maximum, we would get into absurd territory quickly; most CPU architectures support page sizes that are completely impractical as the normal page size of a general-purpose OS, and so these page sizes are either not supported at all by OSs, or only supported with special opt-in flags (e.g. MAP_HUGE*).

Obviously this interpretation is not very meaningful in freestanding because for all we know there's no mmap syscall/function at all. But that's fine, because for all we know there may not be any paging at all anyway. And that implies that the standard library must not exploit knowledge about the supposed safety of overlong reads within a page, for example. So for freestanding specifically, it just doesn't really make sense to speak of page size. If there actually is one in effect, the programmer needs to tell us about it. Otherwise we're making a potentially invalid assumption.

For other, the situation is similar. We know there's an OS, but not much else; there's a good chance that it does have mmap of some description, but it could certainly also be an OS without paging. We just don't know. (This makes me think that there might not be as much utility in the other OS tag as was originally thought, but I digress.)

(I suppose you could also argue that it's possible for an OS to have mmap operate at a granularity that has nothing to do with the page sizes supported by the CPU architecture. But I don't know why you would do something that silly, and I don't think anyone has, so this probably need not be a consideration.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I suppose you could also argue that it's possible for an OS to have mmap operate at a granularity that has nothing to do with the page sizes supported by the CPU architecture. But I don't know why you would do something that silly, and I don't think anyone has, so this probably need not be a consideration.)

I'm arguing precisely the opposite: it's not possible for an OS to have mmap operate at a granularity different than the page sizes supported by the CPU architecture, which means that page_size_min being 4096 on x86_64 is correct, including on freestanding.

Copy link
Member

@andrewrk andrewrk Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make room for this PR to land, and then hash this out separately. std.mem.indexOfPos is an important function, so we need to give it the appropriate amount of engineering effort it deserves. That means even though I'm tempted to simply say, "this violates pointer provenance, fuck it," I still need to consider whether a page-aware implementation should be considered. It absolutely needs to work on freestanding by default however.

@tiehuis, if available, should be involved in that discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm arguing precisely the opposite: it's not possible for an OS to have mmap operate at a granularity different than the page sizes supported by the CPU architecture, which means that page_size_min being 4096 on x86_64 is correct, including on freestanding.

Just to be clear, what I was getting at with that paragraph is that you can do some incredibly silly stuff with page protection and instruction emulation/resumption at the supervisor level to make it appear to userspace as if the machine has page sizes that don't look anything like the architectural page sizes. It'd be very slow and very stupid, which is why I also said it probably shouldn't be a real consideration.

Let's make room for this PR to land, and then hash this out separately.

👍 But just out of curiosity, why were these changes needed in this PR? That's not clear from the commit messages.

It absolutely needs to work on freestanding by default however.

Agree; I just think the answer here is to avoid the page-size-dependent code paths altogether in the freestanding case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 But just out of curiosity, why were these changes needed in this PR? That's not clear from the commit messages.

Ah, sorry I thought you saw - ubsan_rt depends on std.mem.indexOfSentinel, which was calling std.heap.pageSize(). When that function was implemented, it was accessing a comptime value; in the runtime page size changeset, it was carelessly updated to be a runtime-known value, regressing the function on freestanding/other targets.

https://github.com/ziglang/zig/actions/runs/13497343926/job/37707452929

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

When that function was implemented, it was accessing a comptime value; in the runtime page size changeset, it was carelessly updated to be a runtime-known value, regressing the function on freestanding/other targets.

Wait, was there ever a fully comptime pageSize() function? I thought it was just a page_size constant before. FWIW, it's intentional that pageSize() is potentially runtime: The actual page size in effect can be different (and often is, e.g. on aarch64) from the page_size_min/page_size_max constants. The latter only tell you the bounds of possible values for pageSize().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, by "that function", I meant std.mem.indexOfSentinel.

Rexicon226 and others added 20 commits February 25, 2025 11:22
also seems to work around aarch64 LLVM miscompilation 🤔
Unlike `compiler-rt`, `ubsan` uses the standard library quite a lot.
Using a similar approach to how `compiler-rt` is handled today, where it's
compiled into its own object and then linked would be sub-optimal as we'd
be introducing a lot of code bloat.

This approach always "imports" `ubsan` if the ZCU, if it exists. If it doesn't
such as the case where we're compiling only C code, then we have no choice other
than to compile it down to an object and link. There's still a tiny optimization
we can do in that case, which is when compiling to a static library, there's no
need to construct an archive with a single object. We'd only go back and parse out
ubsan from the archive later in the pipeline. So we compile it to an object instead
and link that to the static library.

TLDR;
- `zig build-exe foo.c` -> build `libubsan.a` and links
- `zig build-obj foo.c` -> doesn't build anything, just emits references to ubsan runtime
- `zig build-lib foo.c -static` -> build `ubsan.o` and link it
- `zig build-exe foo.zig bar.c` -> import `ubsan-rt` into the ZCU
- `zig build-obj foo.zig bar.c` -> import `ubsan-rt` into the ZCU
- `zig build-lib foo.zig bar.c` -> import `ubsan-rt` into the ZCU
Problem here is if zig is asked to create multiple static libraries, it
will build the runtime multiple times and then they will conflict.
Instead we want to build the runtime exactly once.
simply use page_size_min instead.

better yet, this logic would avoid depending on page size entirely...
x86_64 and aarch64 have safe values for page_size_min
auto-merge was automatically disabled February 25, 2025 19:22

Head branch was pushed to by a user without write access

@andrewrk andrewrk enabled auto-merge February 25, 2025 19:23
@andrewrk andrewrk disabled auto-merge February 26, 2025 08:08
@andrewrk andrewrk merged commit c45dcd0 into ziglang:master Feb 26, 2025
9 checks passed
@rohlem

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give more info when a UB trap is detected in zig cc debug mode
5 participants