Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling the out-of-bounds validator #1175

Closed
jacob-hughes opened this issue Feb 15, 2020 · 7 comments
Closed

Disabling the out-of-bounds validator #1175

jacob-hughes opened this issue Feb 15, 2020 · 7 comments
Labels
C-support Category: Not necessarily a bug, but someone asking for support

Comments

@jacob-hughes
Copy link

I'm writing a Garbage Collector in Rust which uses quite a bit of unsafe code. I'm very keen to use Miri to check that I've got the stacked borrows part right, but before I can get to the meat of it, Miri notices (correctly 😉 ) that I'm performing out-of-bounds accesses early on.

I'm already aware of these out-of-bounds accesses: they exist by design. However, it looks like certain warnings can't be suppressed yet [#788], and I already use MSAN + valgrind to catch the unintended existence of this class of bugs so I'm not too bothered about them in Miri. Is there a -Z option to turn this off completely that I've overlooked? If not, is this something I can help to implement, or is it incompatible with Miri's design? Perhaps this won't be an issue when #797 is fixed?

Thanks for the help!

P.s. One could argue that I should instead fix my code, and ordinarily I would agree. Unfortunately, a garbage collector needs to scan and access arbitrary addresses (such as on the application stack and in registers), which tends to upset memory sanitizers.

@RalfJung
Copy link
Member

RalfJung commented Feb 15, 2020

I can only guess what it is you are doing concretely, but there is little Miri can do when you are accessing non-existent memory. It's not like it can just "pretend" that there is memory there that you can read from. Memory, in Miri, is literally one Vec<u8> (+more) per allocation, and if your access goes outside of that and we didn't check the bounds beforehand, Miri would just ICE.

So, I think this falls under "incompatible with Miri's design". We could, in principle, do things like treat such memory as uninitialized, but that would come with a lot of extra complications in some of the deeper parts of the engine, and honestly I'd rather not complicate that code to implement a feature that I do not think Miri should have.

But it would help to have a small piece of code demonstrating what you are doing and the kind of fault you think Miri should have the ability to ignore. Maybe I am misunderstanding what exactly is happening. But it sure sounds like you are causing UB, so I feel the urge to spend the rest of this post with a lengthy warning not to do that.

P.s. One could argue that I should instead fix my code, and ordinarily I would agree. Unfortunately, a garbage collector needs to scan and access arbitrary addresses (such as on the application stack and in registers), which tends to upset memory sanitizers.

I'm afraid that's not how things work in Rust (or C/C++, for that matter). You do not get to reason in terms of "I know how the stack is organized so this is okay". Before you get there, the compiler already assumes that your code will never, ever perform an out-of-bounds memory access, and if you fail to uphold that promise, the result of compilation is just garbage. It may look like assembly code, and it may even behave like you expect it to, but that is just coincidence. You are not just "upsetting memory sanitizers", you are literally violating a fundamental assumption that the Rust compiler is making about your code, and your compiler does replace your code by unicorns and rainbows. You just happen to be lucky that the unicorns are doing the right thing right now, but there is nothing preventing this from changing with any new compiler release, or indeed with any entirely unrelated changed anywhere else in the code you are compiling.

Since you were talking about the stack, here is an example of some LLVM IR that tries to exploit knowledge about stack layout. The out-of-bounds access causes this program to be ill-formed, so the compilation result is not what one might naively expect. There is no way to protect against such "miscompilations" (in quotes because the compiler has done nothing wrong, so calling it a miscompilation isn't accurate) other than not causing UB.

I also wrote a lengthy blogpost on the subject. The upshot is that if you think your code is running on an x86 or ARM CPU, you are mistaken -- your code first and foremost runs on the Rust Abstract Machine, and only if that execution doesn't cause UB, you get to think of your program as running on something lower-level.

a garbage collector needs to scan and access arbitrary addresses

There is no way to access arbitrary memory in Rust. That is just not an operation the language supports. (And C/C++ do not support it either.) It may seem like you can cheat your way around that limitation by "just" ignoring UB, but that is a false impression. If you need to do something that Rust cannot do, the only options are (a) inline assembly (or really FFI to any other language that can do these things, but assembly is the only one I know of that does), and (b) extending Rust with support for whatever it is you want to do (e.g. this is what people are working on for unwinding across FFI boundaries).

Now, I am aware that sometimes one needs to cause deliberate UB. But generally this should come with some kind of idea for what would need to be done to the language to not make this UB, or the operation is "just" a variant of one that is UB-free (such as replacing a non-atomic racy access by a relaxed atomic access). For out-of-bounds accesses I don't see that. If the GC you are talking about is for a language you are managing inside Rust, it should be possible without OOB access (though once you start JIT'ing you are leaving the realm of what Miri can emulate). If this is something like the Boehm GC, then from what I know we are deep in the realm of UB and if Miri tells you as much then it works as intended. ;)

@RalfJung RalfJung added the C-support Category: Not necessarily a bug, but someone asking for support label Feb 15, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Feb 16, 2020

Orthogonal question: would reading through /proc/mem be sound as it goes through file system APIs?

@RalfJung
Copy link
Member

lol, the most hilarious things can happen when a program in Miri goes through /proc/self/mem. It has full access to the interpreter internal state that way.^^

@oli-obk
Copy link
Contributor

oli-obk commented Feb 16, 2020

Oh right. I'd mean exposing the interpreted memory because that's what would happen at real execution

@jacob-hughes
Copy link
Author

Thanks for the reply @RalfJung. Some interesting points for me to take into consideration here, I appreciate the detailed explanation. Perhaps it may be worth me clarifying a use-case I have which Miri identifies as out-of-bounds UB which I'm still unsure of.

I am implementing the GlobalAlloc's alloc method to write a small usize-sized header to each allocated block before returning a pointer to the address immediately afterwards. I maintain the invariant in my application that this header is accessible by taking the pointer returned from alloc minus a usize-sized offset.

Miri identifies this as an out-of-bounds access, however. I assume this is because it treats the pointer returned from alloc as the base of a block. Does this make what I'm doing here undefined behaviour? Is it illegal to return a derived pointer from alloc and access the base of block providing you are sure you have not gone past the beginning? Or is this a case of Miri being overly conservative?

@RalfJung
Copy link
Member

Hm, interesting. There might be some aliasing issues there, but that does not seem to be what you are running into.

Right now Miri actually ignores the #[global_allocator] and always directly creates a "native" allocation that is subject to "all the checks". Probably that is the issue you are having. As already mentioned, if you could provide an example demonstrating your issue, that would help tremendously as I wouldn't have to guess. ;)

If my guess is right, could you open a new issue asking to support that attribute, ideally with a self-contained testcase?

@jacob-hughes
Copy link
Author

Hi @RalfJung , apologies for the late reply!

Right now Miri actually ignores the #[global_allocator] and always directly creates a "native" allocation that is subject to "all the checks".

Aha, this sounds like what I'm encountering. I've raised an issue here #1207 and will close this discussion.

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-support Category: Not necessarily a bug, but someone asking for support
Projects
None yet
Development

No branches or pull requests

3 participants