Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] std::expected: operations may overwrite bytes of unrelated objects #70494

Closed
jiixyj opened this issue Oct 27, 2023 · 14 comments · Fixed by #69673
Closed

[libc++] std::expected: operations may overwrite bytes of unrelated objects #70494

jiixyj opened this issue Oct 27, 2023 · 14 comments · Fixed by #69673
Assignees
Labels
ABI Application Binary Interface libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@jiixyj
Copy link
Contributor

jiixyj commented Oct 27, 2023

Some operations, like emplace(), might overwrite bytes of unrelated objects (example from #68733 (comment)):

struct bool_with_padding {
    alignas(8) bool b = false;
};

struct s : std::expected<bool_with_padding, bool> {
    const bool please_dont_overwrite_me[6] =  //
        {true, true, true, true, true, true};
};

static_assert(sizeof(s) == sizeof(bool_with_padding));

int main() {
    s s;

    s.emplace();

    assert(s.please_dont_overwrite_me[5]); // will abort
    assert(s.please_dont_overwrite_me[4]);
    assert(s.please_dont_overwrite_me[3]);
    assert(s.please_dont_overwrite_me[2]);
    assert(s.please_dont_overwrite_me[1]);
    assert(s.please_dont_overwrite_me[0]);
}

This happens because currently, std::__libcppdatasize<std::expected<bool_with_padding, bool>>::value == 2: One byte because of std::__libcppdatasize<bool_with_padding>::value == 1 and one additional byte for expected's "has value flag.

However, the emplace() operation destructs/constructs another bool_with_padding in place -- and that will overwrite all sizeof(bool_with_padding) == 8 bytes of the object, clobbering the six unrelated bools that are stored in expected's tail padding.

std::expected has to make sure that all of its tail padding (if it has some) cannot be touched by any destructor/constructor calls of the value/error.

There is a PR up that has a work in progress fix and some more discussion: #69673

@github-actions github-actions bot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 27, 2023
danakj added a commit to danakj/subspace that referenced this issue Nov 20, 2023
…ternally

llvm/llvm-project#70494 points out that `[[no_unique_address]]` on a
union member can cause its placement-new construction to clobber data in
its tail padding, as ctors can and do write their entire `sizeof()`
size, not just their data-size.

So we do some wild things to prevent this from occurring:

The `T` and `E` union are grouped with the state into a struct, which we
call `Inner`. This structure is always constructed together as a group
when changing the state to `Ok` or `Err`. This ensures that we construct
`T` or `E` and then set the state (which follows in the struct) thus
avoiding clobbering of the state.

Second, the `Inner` struct needs to be protected from clobbering other
things in its tail padding. So it is wrapped in another struct that
conditionally applies `[[no_unique_address]]` to it, which we call
`MaybeNoUniqueAddress`.

We mark `Inner` as `[[no_unique_address]]` only when there is no tail
padding in `T` or `E`. Then the tail padding of `Inner` which is
introduced by the state flag can be used by things external to the
Result if `T` are without tail padding.

All of this is put into the outermost structure `StorageVoid` or
`StorageNonVoid` for a void `T` or not, respectively. Everything is
marked `[[no_unique_address]]` except `Inner` which is conditionally
marked as mentioned above.

In order to do this, we've rewritten the entire implementation of
`Result`'s storage. In the process, copy/move operations and destructors
are all trivial if the same for both `T` and `E` are trivial, which
fixes chromium#403.
danakj added a commit to danakj/subspace that referenced this issue Nov 20, 2023
…ternally

llvm/llvm-project#70494 points out that `[[no_unique_address]]` on a
union member can cause its placement-new construction to clobber data in
its tail padding, as ctors can and do write their entire `sizeof()`
size, not just their data-size.

So we do some wild things to prevent this from occurring:

The `T` and `E` union are grouped with the state into a struct, which we
call `Inner`. This structure is always constructed together as a group
when changing the state to `Ok` or `Err`. This ensures that we construct
`T` or `E` and then set the state (which follows in the struct) thus
avoiding clobbering of the state.

Second, the `Inner` struct needs to be protected from clobbering other
things in its tail padding. So it is wrapped in another struct that
conditionally applies `[[no_unique_address]]` to it, which we call
`MaybeNoUniqueAddress`.

We mark `Inner` as `[[no_unique_address]]` only when there is no tail
padding in `T` or `E`. Then the tail padding of `Inner` which is
introduced by the state flag can be used by things external to the
Result if `T` are without tail padding.

All of this is put into the outermost structure `StorageVoid` or
`StorageNonVoid` for a void `T` or not, respectively. Everything is
marked `[[no_unique_address]]` except `Inner` which is conditionally
marked as mentioned above.

In order to do this, we've rewritten the entire implementation of
`Result`'s storage. In the process, copy/move operations and destructors
are all trivial if the same for both `T` and `E` are trivial, which
fixes chromium#403.
danakj added a commit to danakj/subspace that referenced this issue Nov 21, 2023
…ternally

llvm/llvm-project#70494 points out that `[[no_unique_address]]` on a
union member can cause its placement-new construction to clobber data in
its tail padding, as ctors can and do write their entire `sizeof()`
size, not just their data-size.

So we do some wild things to prevent this from occurring:

The `T` and `E` union are grouped with the state into a struct, which we
call `Inner`. This structure is always constructed together as a group
when changing the state to `Ok` or `Err`. This ensures that we construct
`T` or `E` and then set the state (which follows in the struct) thus
avoiding clobbering of the state.

Second, the `Inner` struct needs to be protected from clobbering other
things in its tail padding. So it is wrapped in another struct that
conditionally applies `[[no_unique_address]]` to it, which we call
`MaybeNoUniqueAddress`.

We mark `Inner` as `[[no_unique_address]]` only when there is no tail
padding in `T` or `E`. Then the tail padding of `Inner` which is
introduced by the state flag can be used by things external to the
Result if `T` are without tail padding.

All of this is put into the outermost structure `StorageVoid` or
`StorageNonVoid` for a void `T` or not, respectively. Everything is
marked `[[no_unique_address]]` except `Inner` which is conditionally
marked as mentioned above.

In order to do this, we've rewritten the entire implementation of
`Result`'s storage. In the process, copy/move operations and destructors
are all trivial if the same for both `T` and `E` are trivial, which
fixes chromium#403.
danakj added a commit to danakj/subspace that referenced this issue Nov 25, 2023
…ternally

llvm/llvm-project#70494 points out that `[[no_unique_address]]` on a
union member can cause its placement-new construction to clobber data in
its tail padding, as ctors can and do write their entire `sizeof()`
size, not just their data-size.

So we do some wild things to prevent this from occurring:

The `T` and `E` union are grouped with the state into a struct, which we
call `Inner`. This structure is always constructed together as a group
when changing the state to `Ok` or `Err`. This ensures that we construct
`T` or `E` and then set the state (which follows in the struct) thus
avoiding clobbering of the state.

Second, the `Inner` struct needs to be protected from clobbering other
things in its tail padding. So it is wrapped in another struct that
conditionally applies `[[no_unique_address]]` to it, which we call
`MaybeNoUniqueAddress`.

We mark `Inner` as `[[no_unique_address]]` only when there is no tail
padding in `T` or `E`. Then the tail padding of `Inner` which is
introduced by the state flag can be used by things external to the
Result if `T` are without tail padding.

All of this is put into the outermost structure `StorageVoid` or
`StorageNonVoid` for a void `T` or not, respectively. Everything is
marked `[[no_unique_address]]` except `Inner` which is conditionally
marked as mentioned above.

In order to do this, we've rewritten the entire implementation of
`Result`'s storage. In the process, copy/move operations and destructors
are all trivial if the same for both `T` and `E` are trivial, which
fixes chromium#403.
danakj added a commit to chromium/subspace that referenced this issue Nov 25, 2023
…ternally

llvm/llvm-project#70494 points out that `[[no_unique_address]]` on a
union member can cause its placement-new construction to clobber data in
its tail padding, as ctors can and do write their entire `sizeof()`
size, not just their data-size.

So we do some wild things to prevent this from occurring:

The `T` and `E` union are grouped with the state into a struct, which we
call `Inner`. This structure is always constructed together as a group
when changing the state to `Ok` or `Err`. This ensures that we construct
`T` or `E` and then set the state (which follows in the struct) thus
avoiding clobbering of the state.

Second, the `Inner` struct needs to be protected from clobbering other
things in its tail padding. So it is wrapped in another struct that
conditionally applies `[[no_unique_address]]` to it, which we call
`MaybeNoUniqueAddress`.

We mark `Inner` as `[[no_unique_address]]` only when there is no tail
padding in `T` or `E`. Then the tail padding of `Inner` which is
introduced by the state flag can be used by things external to the
Result if `T` are without tail padding.

All of this is put into the outermost structure `StorageVoid` or
`StorageNonVoid` for a void `T` or not, respectively. Everything is
marked `[[no_unique_address]]` except `Inner` which is conditionally
marked as mentioned above.

In order to do this, we've rewritten the entire implementation of
`Result`'s storage. In the process, copy/move operations and destructors
are all trivial if the same for both `T` and `E` are trivial, which
fixes #403.
@ldionne ldionne added the ABI Application Binary Interface label Nov 30, 2023
@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

@huixie90 and I just spent some time going through this issue again while working on #70506, which is basically the same issue but for __movable_box. We currently have two questions related to fixing this issue.

First, is the following code valid?

#include <cstring>

struct Empty {
    void zero() { std::memset(this, 0, sizeof(*this)); }
};

struct CompressedPair : Empty {
    CompressedPair(Empty e, bool x) : Empty(e), b(x) { }
    bool b;
};

void f(CompressedPair& p) {
    p.zero(); // this also overrides the bool member of the compressed pair
}

If this code is valid, the EBO is arguably broken. One the one hand std::memset(this, 0, sizeof(*this)) is super fishy, but if a type is allowed to take for granted that it owns its padding bytes, then this should be valid (unless prohibited by some other rule)?

Second question: Is the following code valid?

#include <memory>
#include <new>

struct Foo {
    Foo(int i, bool b) : i_(i), b_(b) { }

    Foo& operator=(Foo const& other) {
        std::destroy_at(this);
        std::construct_at(this, other.i_, other.b_);
        return *std::launder(this);
    }

    int i_;
    bool b_;
};

This is clearly not a great way to implement operator=. But if this is legal, then we basically can't use [[no_unique_address]] whenever we don't know exactly what type we're given, since calling operator= of that type would overwrite the next object that sits in its padding. Note that this is not limited to assignment -- any other member function has the same problem.

This is very similar to the case brought up by @huixie90 in #68733 (comment):

std::expected<T, E> e = ...;
T& val = *e;
val.~T();
std::construct_at(&val, args...); // this still overwrites has_value

There, @jwakely had said that it wasn't (in his opinion) something we want to support, and I agree. However, while the use case immediately above is hard to defend, something like an assignment operator using a destroy-and-construct pattern is probably much more common (although I don't know yet if it's technically legal). If it is legal, then I suspect we can't use [[no_unique_address]] at all in our generic types, which would be a bummer.

I'm going to CC some folks here who might have knowledge/opinions about how this should be handled, sorry if this ends up being a bit spammy but we need to statute on this before we can make progress with either this issue or #70506.

CC @jiixyj @huixie90 @philnik777 @frederick-vs-ja @zygoloid @jwakely

Edit: Added std::launder in question 2 as required by https://eel.is/c++draft/basic.life#note-5

@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

@huixie90 and I went over https://eel.is/c++draft/basic.life#8, and it seems to say that the answer to question 2 is "Yes, this is valid". If that's true, then we can kiss the [[no_unique_address]] optimizations goodbye. Or, alternatively, we could change something in the Standard to give us that kind of freedom.

For now, I'm hoping we read https://eel.is/c++draft/basic.life#8 wrong but I am looking forward to hearing what other people think.

@philnik777
Copy link
Contributor

philnik777 commented Dec 1, 2023

@huixie90 and I just spent some time going through this issue again while working on #70506, which is basically the same issue but for __movable_box. We currently have two questions related to fixing this issue.

First, is the following code valid?

#include <cstring>

struct Empty {
    void zero() { std::memset(this, 0, sizeof(*this)); }
};

struct CompressedPair : Empty {
    CompressedPair(Empty e, bool x) : Empty(e), b(x) { }
    bool b;
};

void f(CompressedPair& p) {
    p.zero(); // this also overrides the bool member of the compressed pair
}

If this code is valid, the EBO is arguably broken. One the one hand std::memset(this, 0, sizeof(*this)) is super fishy, but if a type is allowed to take for granted that it owns its padding bytes, then this should be valid (unless prohibited by some other rule)?

I'm pretty sure this is UB. The standard seems to specifically exclude potentially-overlapping subobjects from any guarantees it makes regarding the value representation of objects. The thing you're most likely trying to do here is ending the lifetime of Empty and starting the lifetime of a new one by memsetting the memory. The problem is that Empty is potentially overlapping in this context, so it's not transparently replaceable (http://eel.is/c++draft/basic.life#8). This means that you're not just ending the lifetime of Empty, but also of CompressedPair.

Second question: Is the following code valid?

#include <memory>
#include <new>

struct Foo {
    Foo(int i, bool b) : i_(i), b_(b) { }

    Foo& operator=(Foo const& other) {
        std::destroy_at(this);
        std::construct_at(this, other.i_, other.b_);
        return *std::launder(this);
    }

    int i_;
    bool b_;
};

I'd say no, for the same reason outlined above.

@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

I'm pretty sure this is UB. The standard seems to specifically exclude potentially-overlapping subobjects from any guarantees it makes regarding the value representation of objects. The thing you're most likely trying to do here is ending the lifetime of Empty and starting the lifetime of a new one by memsetting the memory. The problem is that Empty is potentially overlapping in this context, so it's not transparently replaceable (http://eel.is/c++draft/basic.life#8). This means that you're not just ending the lifetime of Empty, but also of CompressedPair.

But as long as we access the new object via the pointer we obtained from std::launder, I think we're good, aren't we? https://eel.is/c++draft/basic.life#note-5

@philnik777
Copy link
Contributor

I'm pretty sure this is UB. The standard seems to specifically exclude potentially-overlapping subobjects from any guarantees it makes regarding the value representation of objects. The thing you're most likely trying to do here is ending the lifetime of Empty and starting the lifetime of a new one by memsetting the memory. The problem is that Empty is potentially overlapping in this context, so it's not transparently replaceable (http://eel.is/c++draft/basic.life#8). This means that you're not just ending the lifetime of Empty, but also of CompressedPair.

But as long as we access the new object via the pointer we obtained from std::launder, I think we're good, aren't we? https://eel.is/c++draft/basic.life#note-5

No, because Empty is a potentially-overlapping subobject. This basically means that you might also end the lifetime of CompressedPair, in which case you can't assume that it's still holding it's values. The same applies to expected. You can destroy the subobject and create a new one, but there isn't a std::expected in that place any longer.

@philnik777
Copy link
Contributor

Maybe to make it a bit clearer: [basic.life]#8 only holds if your object is transparently replacable. This is not the case for Empty because it's a potentially-overlapping subobject. This means that you have to end the lifetime of the first parent object that is not potentially-overlapping.

@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

Okay, let's say we agree on that. But it still means that whether the Foo example is valid code basically depends on whether we decide to use it as a potentially-overlapping subobject or not. So while it's not great code to begin with, this code is valid on its own -- it just stops being valid if you use it as a potentially-overlapping subobject.

Whether we're allowed to make this a potentially-overlapping subobject in the standard library is not clear to me, and it seems like we might need some wording to make that clear. And assuming that's what we want, because while it produces a great object representation, the flipside is that code written naively by users may instead exhibit rather subtle undefined behavior.

@philnik777
Copy link
Contributor

Okay, let's say we agree on that. But it still means that whether the Foo example is valid code basically depends on whether we decide to use it as a potentially-overlapping subobject or not. So while it's not great code to begin with, this code is valid on its own -- it just stops being valid if you use it as a potentially-overlapping subobject.

Given that standard libraries already do this for decades seems to imply that we can. I'm also not aware of the standard giving any general restrictions on how standard library objects are represented other than it being a representation within the bounds of the standard (e.g. we can't change the alignment). I think some clarification would be good - especially around containers, but I don't think that should stop us from using a good representation.

Whether we're allowed to make this a potentially-overlapping subobject in the standard library is not clear to me, and it seems like we might need some wording to make that clear. And assuming that's what we want, because while it produces a great object representation, the flipside is that code written naively by users may instead exhibit rather subtle undefined behavior.

Yes, to some degree. But I would also argue that construct_at, destroy_at and launder are really not tools most people should ever touch. Especially launder is a tool that is understood by very few people (from my experience at least) and is required to make your second example conforming. With these kinds of shenanigans you're simply inviting subtle UB. We as standard library authors don't always fully understand when to use these tools and how to use them correctly, and I'm pretty sure we understand the language better than 99% of C++ programmers.

@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

Yes, to some degree. But I would also argue that construct_at, destroy_at and launder are really not tools most people should ever touch. Especially launder is a tool that is understood by very few people (from my experience at least) and is required to make your second example conforming. With these kinds of shenanigans you're simply inviting subtle UB. We as standard library authors don't always fully understand when to use these tools and how to use them correctly, and I'm pretty sure we understand the language better than 99% of C++ programmers.

I'm not sure whether this makes me want to use [[no_unique_address]] more, or ban it completely from our code base. I agree with everything you've said here, but you can also read this to say that we shouldn't do subtle shenanigans in our implementation that will blow up when the 99% of C++ programmers write their code in a slightly naive way. I think one important consideration here goes beyond the pedantic interpretation of the Standard -- is this going to cause a lot of actual problems? If we think it will, then the very least would be for the Standard to call out explicitly that we're allowed to do this, not rely on a pretty impenetrable paragraph in basic.life#8 and an under-specified notion of exposition-only members to give us good conscience.

@huixie90
Copy link
Contributor

huixie90 commented Dec 1, 2023

Okay, let's say we agree on that. But it still means that whether the Foo example is valid code basically depends on whether we decide to use it as a potentially-overlapping subobject or not. So while it's not great code to begin with, this code is valid on its own -- it just stops being valid if you use it as a potentially-overlapping subobject.

Given that standard libraries already do this for decades seems to imply that we can. I'm also not aware of the standard giving any general restrictions on how standard library objects are represented other than it being a representation within the bounds of the standard (e.g. we can't change the alignment). I think some clarification would be good - especially around containers, but I don't think that should stop us from using a good representation.

Whether we're allowed to make this a potentially-overlapping subobject in the standard library is not clear to me, and it seems like we might need some wording to make that clear. And assuming that's what we want, because while it produces a great object representation, the flipside is that code written naively by users may instead exhibit rather subtle undefined behavior.

Yes, to some degree. But I would also argue that construct_at, destroy_at and launder are really not tools most people should ever touch. Especially launder is a tool that is understood by very few people (from my experience at least) and is required to make your second example conforming. With these kinds of shenanigans you're simply inviting subtle UB. We as standard library authors don't always fully understand when to use these tools and how to use them correctly, and I'm pretty sure we understand the language better than 99% of C++ programmers.

the user does not need to write ‘std::launder’ to make its code valid. It is just valid on its own. It is us who make it potentially overlapping. the very paragraph you quoted in the standard has almost exact same example as Foo without std::launder and with comment “well defined”

@zygoloid
Copy link
Collaborator

zygoloid commented Dec 1, 2023

Whether we're allowed to make this a potentially-overlapping subobject in the standard library is not clear to me, and it seems like we might need some wording to make that clear. And assuming that's what we want, because while it produces a great object representation, the flipside is that code written naively by users may instead exhibit rather subtle undefined behavior.

It seems to me that:

  • When an object of a user-defined type is created and held by a standard library type, it is not specified whether that object is a potentially-overlapping subobject.
  • It's obviously unreasonable for user code to end the lifetime of a standard library type, and is undefined behavior by [res.on.objects]/2.
  • Therefore operations on a user-defined type that would end the lifetime of an enclosing object if the user-defined object is potentially-overlapping result in undefined behavior (if the standard library type is currently being accessed or is ever accessed again afterwards).

So I think it's fine to store objects of user-defined types as potentially-overlapping subobjects, and the burden is on the library user to not trample outside their own object.

If libc++ wants to support user-defined types that memset(this, 0, sizeof(*this)); or similar as an extension, it of course can, but that'd be a big ABI break -- all the EBO stuff for allocators violates that, every use of compressed_pair violates that, etc.

@ldionne
Copy link
Member

ldionne commented Dec 1, 2023

It seems to me that:

  • When an object of a user-defined type is created and held by a standard library type, it is not specified whether that object is a potentially-overlapping subobject.

That's true. But we should call it out explicitly, which I don't think we do right now. And honestly I'd be curious to hear what the general LEWG constituency would have to say about this if they were here -- I suspect they might find it surprising. Basically, it feels like this "affordance" is given to us by means of a subtle under-specification of the library -- it's not clear to me that this is the "design intent".

  • It's obviously unreasonable for user code to end the lifetime of a standard library type, and is undefined behavior by [res.on.objects]/2.

100%

  • Therefore operations on a user-defined type that would end the lifetime of an enclosing object if the user-defined object is potentially-overlapping result in undefined behavior (if the standard library type is currently being accessed or is ever accessed again afterwards).

I think I don't understand where the Standard says that the lifetime of the enclosing object will end if the lifetime of its potentially-overlapping subobject ends and then another object is put in its place. I looked at all of basic.life but I can't seem to find that or logically arrive at that conclusion. It seems like both you and @philnik777 agree on that so I don't question it, but I'd like to understand.

Is it http://eel.is/c++draft/basic.life#1.5 which says:

the storage which the object occupies is released, or is reused by an object that is not nested within o ([intro.object]).

and somehow a potentially-overlapping subobject is not considered to be nested within its enclosing object for that purpose?

If libc++ wants to support user-defined types that memset(this, 0, sizeof(*this)); or similar as an extension, it of course can, but that'd be a big ABI break -- all the EBO stuff for allocators violates that, every use of compressed_pair violates that, etc.

I think the major difference here is that an empty object doing memset(this, 0, sizeof(*this)) is a lot less likely to happen than an object happening to have padding (struct { int i; bool b; }) and doing the destroy-and-then-construct pattern.

Don't get me wrong -- I want us to have the efficient representation for expected and I think we all agree about the fact that the library should be allowed to do these tricks. But I also don't want us to do it via what feels like shenanigans exploiting an under-specification in the Standard, given this has potentially serious implications on the correctness of user code (and TBH the failure mode here is the worst possible).

I'd be really curious to hear what @jwakely has to say here -- do you think I'm being overprotective of users doing something stupid? Would you take a LWG issue that asks to clarify this? Or perhaps this would need to be an actual paper?

@zygoloid
Copy link
Collaborator

zygoloid commented Dec 1, 2023

  • When an object of a user-defined type is created and held by a standard library type, it is not specified whether that object is a potentially-overlapping subobject.

That's true. But we should call it out explicitly, which I don't think we do right now.

Definitely, it seems like a good idea to me for the standard to explicitly call out that library types can hold user-defined types as potentially-overlapping subobjects -- or, better, to say exactly when that's permitted. Maybe the elements of a vector can't be, for example, meaning that it's fine to memset over [data(), data() + size()) if it's fine for the element type, and conversely that the library implementation is not allowed to store the capacity in the tail padding of vector elements, even if it would fit there. But probably the elements of a tuple or variant should be permitted to be potentially-overlapping, and certainly allocators, comparators, hashers, etc. should.

And honestly I'd be curious to hear what the general LEWG constituency would have to say about this if they were here -- I suspect they might find it surprising. Basically, it feels like this "affordance" is given to us by means of a subtle under-specification of the library -- it's not clear to me that this is the "design intent".

If we set aside [[no_unique_address]] for the moment, and think only of EBO and similar derivation-based layout optimizations, the exact same concerns apply there too[*]. I think it's reasonable to say that it's de facto design intent at this point that the standard library can use derivation-based tricks to compress object representations for things like allocators and comparators. (Simply because if something is done for long enough and ubiquitously enough, it becomes hard to say that it's an unintended technique.)

[[no_unique_address]] was designed specifically to support those same EBO tricks without the problems of inheritance, and indeed a big part of the design intent of [[no_unique_address]] was to permit its use in the library, replacing the use of inheritance. But that intent was presented to EWG, not (as I recall) to LEWG.

[*]: I say the exact same, but that's not quite true. Prior to [[no_unique_address]], things like memset-over-self probably would have worked fine if your type was final or effectively final in some way (eg, if your type has a virtual base with a private destructor, or if your type has a final virtual destructor), but with [[no_unique_address]] even those cases can't be sure that they own their padding bytes.

I think I don't understand where the Standard says that the lifetime of the enclosing object will end if the lifetime of its potentially-overlapping subobject ends and then another object is put in its place. I looked at all of basic.life but I can't seem to find that or logically arrive at that conclusion. It seems like both you and @philnik777 agree on that so I don't question it, but I'd like to understand.

Is it http://eel.is/c++draft/basic.life#1.5 which says:

the storage which the object occupies is released, or is reused by an object that is not nested within o ([intro.object]).

and somehow a potentially-overlapping subobject is not considered to be nested within its enclosing object for that purpose?

That's one half of the rule. The other half is the "transparent replacement" rule in [basic.life]/8, which describes the circumstances under which a new object (eg, created by construct_at) is nested within an existing object. (The chain of logic is a bit longer than that, and in fact there's a CWG issue along the way: [basic.life]/8 defines "transparently replaceable", and then [intro.object]/2 tries to define the same thing but gets it a bit wrong. But the intent is that [intro.object]/2 says that if you transparently replace a subobject, then you are a subobject. And then [intro.object]/4 says that if you are a subobject of an object, you're nested within that object.)

The (intended) upshot is: if you create an object in some storage, then either:

  1. You manage to transparently replace an existing object, and all references to that existing object now refer to you.
  2. You create an unrelated object in the storage, and the lifetimes end for any existing objects that it overlaps.

(There are actually a few more cases here, but those are the main ones. For example, you can create an object inside a byte array without ending the lifetime of the array or things enclosing it. And there are cases where the new object is a member subobject of an existing object, but the name of the member can't be used to access it -- or at least, there were, when we didn't allow transparent replacement of const subobjects.)

For potentially-overlapping objects, you can't transparently replace them ([basic.life]/8.4), so you end up in case 2.

@ldionne
Copy link
Member

ldionne commented Dec 7, 2023

If we set aside [[no_unique_address]] for the moment, and think only of EBO and similar derivation-based layout optimizations, the exact same concerns apply there too[*]. I think it's reasonable to say that it's de facto design intent at this point that the standard library can use derivation-based tricks to compress object representations for things like allocators and comparators. (Simply because if something is done for long enough and ubiquitously enough, it becomes hard to say that it's an unintended technique.)

Yes, I agree with this and I am not questioning the design intent when it comes to EBO. All implementations do it and it's expected by users -- in fact a library that doesn't do it would be considered to have a poor QOI.

For [[no_unique_address]], I feel like it's a bit different because you can manage to shoot yourself in the foot much more easily. EBO'd classes are required to be empty, which means that the likelihood you're doing something funky with your byte of storage is very low. With [[no_unique_address]], literally any type that happens to have padding could fall victim to this issue. I still agree this is what we want, but I'm just saying it's a lot easier to fall victim to overlapping padding bytes now than it used to be, and with code that looks a lot less suspicious for a user.

I think I don't understand where the Standard says that the lifetime of the enclosing object will end if the lifetime of its potentially-overlapping subobject ends and then another object is put in its place. [...]

That's one half of the rule. The other half is the "transparent replacement" rule in [basic.life]/8, which describes the circumstances under which a new object (eg, created by construct_at) is nested within an existing object. (The chain of logic is a bit longer than that, and in fact there's a CWG issue along the way: [basic.life]/8 defines "transparently replaceable", and then [intro.object]/2 tries to define the same thing but gets it a bit wrong. But the intent is that [intro.object]/2 says that if you transparently replace a subobject, then you are a subobject. And then [intro.object]/4 says that if you are a subobject of an object, you're nested within that object.)

[...]

For potentially-overlapping objects, you can't transparently replace them ([basic.life]/8.4), so you end up in case 2.

I see, thanks a lot for the clear explanation, I appreciate it. So basically, we're doing a non-transparent replacement of the sub-object, which means that we're not considered "nested" inside the expected anymore, and since we're not nested within it, [basic.life]/1.5 says that we essentially just reused the storage of the expected for a completely unrelated object and ended the expected's lifetime. It makes sense, it's just that we're so close to being transparently-replaceable that this whole chain of consequences hangs by a thread. But I get it now, thanks for explaining.

Okay, so I suggest the following plan:

  1. We will start by landing @huixie90 's patch to fix __movable_box ([libc++][ranges][abi-break] Fix movable_box overwriting memory of data that lives in the tail padding #71314). It's a simpler patch that the expected issue. As part of that patch, we will settle on the mechanism we want to use for signalling this ABI break, which will probably be to introduce a special ABI tag and tag all the user-visible types that contain a __movable_box.
  2. Then we can land the fix to std::expected and follow the same ABI signalling mechanism.
  3. I will write a short paper that suggests adding something to the standard to bless implementations doing this.

@ldionne ldionne added this to the LLVM 18.0.X Release milestone Jan 16, 2024
ldionne added a commit that referenced this issue Jan 22, 2024
Currently std::expected can have some padding bytes in its tail due to
[[no_unique_address]]. Those padding bytes can be used by other objects.
For example, in the current implementation:

  sizeof(std::expected<std::optional<int>, bool>) == 
    sizeof(std::expected<std::expected<std::optional<int>, bool>, bool>)

As a result, the data layout of an
  std::expected<std::expected<std::optional<int>, bool>, bool> 
can look like this:

              +-- optional "has value" flag
              |        +--padding
  /---int---\ |        |
  00 00 00 00 01 00 00 00
                |  |
                |  +- "outer" expected "has value" flag
                |
                +- expected "has value" flag

This is problematic because `emplace()`ing the "inner" expected can not
only overwrite the "inner" expected "has value" flag (issue #68552) but
also the tail padding where other objects might live.

This patch fixes the problem by ensuring that std::expected has no tail
padding, which is achieved by conditional usage of [[no_unique_address]]
based on the tail padding that this would create.

This is an ABI breaking change because the following property changes:

  sizeof(std::expected<std::optional<int>, bool>) <
    sizeof(std::expected<std::expected<std::optional<int>, bool>, bool>)

Before the change, this relation didn't hold. After the change, the relation
does hold, which means that the size of std::expected in these cases increases
after this patch. The data layout will change in the following cases where
tail padding can be reused by other objects:

  class foo : std::expected<std::optional<int>, bool> {
    bool b;
  };

or using [[no_unique_address]]:

  struct foo {
    [[no_unique_address]] std::expected<std::optional<int>, bool> e;
    bool b;
  };

The vendor communication is handled in #70820.
Fixes: #70494

Co-authored-by: philnik777 <nikolasklauser@berlin.de>
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants