Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection of cases of deliberate UB #158

Closed
RalfJung opened this issue Jul 7, 2019 · 20 comments · Fixed by #448
Closed

Collection of cases of deliberate UB #158

RalfJung opened this issue Jul 7, 2019 · 20 comments · Fixed by #448
Labels
C-list Category: A list/collection of some sort. Please help maintain it!

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2019

Sometimes crates deliberately cause UB because there is no other way to do what they need, or because the alternatives are unacceptably worse in some sense. This is not great, but it is valuable feedback for us -- we should try to either make those things not UB, or provide UB-free alternatives that are good enough!

Some cases:

  • crossbeam's AtomicCell does various things with uninitialized memory. Most of this would be fine if we allowed uninitialized integers and adapted LLVM's handling of data races, but compare_exchange is worse.
  • bytes does a non-atomic plain load that races, because relaxed loads cost too much performance. (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see this thread on using "unordered".)
  • Every offset_of macro everywhere, in particular
  • Unwinding through FFI boundaries is used in mozjpeg-sys and possibly other image-related libs. One of many long discussions on the issue is here.

In this issue, please let's only discuss whether something is UB according to current rules. if you want to talk about ways to improve the code or the spec to avoid UB, open a new issue (or there might already be one).

@RalfJung RalfJung added the C-list Category: A list/collection of some sort. Please help maintain it! label Jul 7, 2019
@BatmanAoD
Copy link
Member

Unwinding through FFI boundaries is used in mozjpeg-sys and possibly other image-related libs. One of many long discussions on the issue is here.

@Shnatsel
Copy link
Member

smallvec deliberately triggers UB as a hint for the compiler that the code is unreachable: https://github.com/servo/rust-smallvec/blob/3ee6d1e3e5a8b5a6eafa5f396c397d5038a9c97e/lib.rs#L130-L139

Some people are confused about why it's done that way: servo/rust-smallvec#158

@RalfJung
Copy link
Member Author

RalfJung commented Aug 27, 2019

That does not look like triggering UB to me. Like, no UB actually happens.

By "triggering UB" I mean to actually hit that execution path (on the Abstract Machine). It's not UB it it's in an if false.

@comex
Copy link

comex commented Aug 27, 2019

For the bytes example, perhaps link this LLVM bug report about optimizations that, if implemented, would probably remove the penalty for relaxed atomics compared to normal loads.

@jonhoo
Copy link

jonhoo commented Aug 30, 2019

So, here's something that's definitely UB that I just re-discovered. Should probably fix it, though I'm not sure if MaybeUninit will be sufficient?
https://github.com/mit-pdos/noria/blob/ba60aae9183e8b47d47a96d5136a27798ec2ba5d/noria-server/dataflow/src/backlog/multiw.rs#L84-L105
(DataType is defined here)

I believe this makes it okay?

@@ -90,19 +90,21 @@ impl Handle {
                 use std::mem;
                 use std::ptr;
                 unsafe {
-                    let mut stack_key: (DataType, DataType) = mem::uninitialized();
+                    let mut stack_key: (mem::MaybeUninit<DataType>, mem::MaybeUninit<DataType>) =
+                        unsafe { (mem::MaybeUninit::uninit(), mem::MaybeUninit::uninit()) };
                     ptr::copy_nonoverlapping(
                         &key[0] as *const DataType,
-                        &mut stack_key.0 as *mut DataType,
+                        stack_key.0.as_mut_ptr(),
                         1,
                     );
                     ptr::copy_nonoverlapping(
                         &key[1] as *const DataType,
-                        &mut stack_key.1 as *mut DataType,
+                        stack_key.1.as_mut_ptr(),
                         1,
                     );
+                    let stack_key =
+                        unsafe { mem::transmute::<_, &(DataType, DataType)>(&stack_key) };
                     let v = h.meta_get_and(&stack_key, then);
-                    mem::forget(stack_key);
                     v
                 }
             }

@RalfJung
Copy link
Member Author

Yeah, they should just use MaybeUnunit. Why are you transmuting a reference instead of a value (or doing a raw ptr cast)?

That's not really "crate causes UB because it cannot be avoided", it's more "crate hasn't been updated to MyabeUninit yet". (Which is fair, that one is not stable for very long yet.)

@jonhoo
Copy link

jonhoo commented Sep 1, 2019

Not sure what you mean? Why would transmuting the value (stack_key) make a difference?

@RalfJung
Copy link
Member Author

RalfJung commented Sep 2, 2019

it's just unnecessary to transmute references, and usually preferred to do raw pointer casts instead. And usually I see people doing value-transmutes for the "assume_init" step. So I wondered why you followed a different style.

Either way though, this doesn't look like a candidate for the list above. There is a good alternative available to this crate, and AFAIK the only reason they are not doing this is that the crate hasn't been updated to follow the Rust-1.36+ style.

@HadrienG2
Copy link

HadrienG2 commented Oct 18, 2019

I have recently tried to reduce the amount of UB in abomonation, but there is one particular area where I have hit what I think is a language wall that is partially related to the offset_of problem.

Deserializing an "abomonated" object entails fixing dangling pointers inside of it to have them point into other data in the serialized bytes. By definition, objects with dangling pointers are not valid, so constructing a reference to them is not okay according to our current UB rules. But I have no idea how to, say, access the inside of an invalid Option to patch its inner pointers without constructing Rust references in the process.

In the case of structs, this will be resolved by &raw once it is available and stable. But in the case of, say, enums, or tuples, or even slices, I have no idea how this UB could be eventually fixed.

See the FIXMEs in https://github.com/TimelyDataflow/abomonation/pull/22/files for more details.

@RalfJung
Copy link
Member Author

Deserializing an "abomonated" object entails fixing dangling pointers inside of it to have them point into other data in the serialized bytes. By definition, objects with dangling pointers are not valid, so constructing a reference to them is not okay according to our current UB rules.

That is only true for pointers at reference type. Like, a dangling NonNull<T> is not a problem at all.

I think a concrete example would help.

tuples

Why should tuples be any different than structs?

slices

For slices, given a *mut T to the beginning of a slice of T, calling .offset(n) gives you a raw pointer to the n-th element.

@HadrienG2
Copy link

HadrienG2 commented Oct 19, 2019

You can find many examples near the "FIXME" comments of the linked pull request.

Regarding tuples, the problem is basically that it is difficult to generate the sequence tup.0, tup.1, tup.2... in a declarative macro, and that the most commonly used workaround uses references. This one might be soluble by introducing a private procedural macro utility, but the associated splitting in two crates will make the code even more obscure. I think some sort of variadic generics is needed to cleanly resolve this, but improvements to macros might do the trick as well.

For slices, the problem is that abomonation gives me, say, a p: NonNull<&[u8]> where the inner &[u8] is dangling, and that I need to know the slice length in order to deserialize the subsequent data. Since the layout of fat pointers is unstable, I believe I have no other way to do this than p.as_ref().len() + crossing fingers that the compiler won't exploit the UB to assume that the dangling data pointer of the slice is valid.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 19, 2019

For slices, the problem is that abomonation gives me, say, a p: NonNull<&[u8]> where the inner &[u8] is dangling, and that I need to know the slice length

Ah, yes -- we definitely need a len method on raw slices, complementing rust-lang/rust#60667. That is purely a library issue though; the language has all the tools you need to do this on raw slices.

You can find many examples near the "FIXME" comments of the linked pull request.

The issue is, that's in the middle of tons of code that I don't know anything about. I was asking for something self-contained -- in typical "minimal reproducible example" style. My guess is, you have a *const Option<T>, and you want a *const T if there's a Some? That is a very reasonable demand and indeed not covered by &raw. Could you bring this up in the &raw tracking issue?

@RalfJung
Copy link
Member Author

Also, can we continue that discussion in #77 ? That fits better for this particular case of UB.

@elichai
Copy link

elichai commented Nov 4, 2019

Another example of offset_of rust-lang/rust-bindgen#1651
Here they use a null pointer, dereference, takes a reference to a field in the struct, then cast to usize and compare to zero.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 4, 2019

Notice that such code is only used in #[test]s. libstd/libc do the same.

@the8472
Copy link
Member

the8472 commented Nov 24, 2019

bytes does a non-atomic plain load that races, because relaxed loads cost too much performance. (Note that LLVM's handling of data races is not enough here, data races still return garbage data.)

Would llvm's unordered atomics help here?

@RalfJung
Copy link
Member Author

Well... "help" for some notion of help. See this forum discussion about "unordered".

@Aaron1011
Copy link
Member

The Ruffle project uses an unsound custom DST (implement as a #[repr(transparent)] struct containing [()]. See ruffle-rs/ruffle#5339 (comment) and https://github.com/ruffle-rs/ruffle/blob/2ad5c644b02a1a65bb677d2385da780a6335f1e6/core/src/string/ptr.rs#L36-L50

@thomcc
Copy link
Member

thomcc commented Dec 18, 2021

That's unsound under stacked borrows, but not under other potential Rust aliasing models — some of the discussions in #256 mention ways it (stacked borrows) might be fixed to accommodate such code.

bors bot added a commit to taiki-e/atomic-memcpy that referenced this issue Feb 13, 2022
1: Remove miri hack r=taiki-e a=taiki-e

Use currently use a hack to avoid rust-lang/rust#69488 and to make sure that Miri errors for  atomic load/store of integers containing uninitialized bytes (which is probably not a problem and uncharted territory at best [1] [2] [3], and can be detected by `-Zmiri-check-number-validity` [4]), do not mask Miri errors for the use of uninitialized bytes (which is definitely a problem).

https://github.com/taiki-e/atomic-memcpy/blob/3507fef17534e4825b2b303d04702b4678e29dd0/src/lib.rs#L426-L450

[1]: crossbeam-rs/crossbeam#315 
[2]: rust-lang/unsafe-code-guidelines#158 
[3]: rust-lang/unsafe-code-guidelines#71 
[4]: rust-lang/miri#1904 

However, this actually causes another "unsupported operation" Miri error.

```
error: unsupported operation: unable to turn pointer into raw bytes
   --> /Users/taiki/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:701:9
    |
701 |         copy_nonoverlapping(src, tmp.as_mut_ptr(), 1);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unable to turn pointer into raw bytes
    |
    = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
```


Co-authored-by: Taiki Endo <te316e89@gmail.com>
@RalfJung
Copy link
Member Author

RalfJung commented Aug 8, 2023

I'll turn this into a markdown file in the repo, and also add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-list Category: A list/collection of some sort. Please help maintain it!
Projects
None yet
Development

Successfully merging a pull request may close this issue.