Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Change the semantics of @frame to allow for inlining async calls #5277

Open
SpexGuy opened this issue May 5, 2020 · 9 comments
Open
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@SpexGuy
Copy link
Contributor

SpexGuy commented May 5, 2020

I went spelunking in some generated async assembly last weekend, and came out with the realization that the current semantics of async, and @frame in particular, are preventing potential optimization opportunities. I think we can change them without breaking any actual use cases.

The problem comes from three properties:

  1. @frame() returns @Frame(@thisFn()), which means each async function must have its own frame type
  2. pointers to frames must cast to anyframe
  3. resume must work on anyframe, which means each frame must have self-identifying information

These three things taken together mean that an async function can never be inlined into another async function, because doing so would make @frame() return the wrong type. So for this code:

var the_frame: anyframe = undefined;
fn main() void {
    _ = async a();
    resume the_frame;
}
fn a() void { // async
    b();
}
fn b() void { // async
    suspend {
        the_frame = @frame();
    }
}
  • b must always have its own frame type, so it cannot be inlined into a
  • a must have its own frame type (the return type from the async call), so it cannot be inlined into main.

This means that wrapping suspend in a library will always have a cost that can not be optimized away, which is a problem.

Obviously properties (2) and (3) are core parts of Zig's async, and shouldn't be changed. But I think we can get away with relaxing (1). If we change @frame() to return anyframe, it means that async functions are now eligible for inlining by expanding the frame of the parent function. We would also have to slightly change the definition of @Frame(foo) to be "the type returned by async foo()".

This would also allow a potential further optimization where frames are only generated by the keyword async, and all of the data needed to resume a frame is stored in one place at the beginning of that "call tree frame". Which would have the nice side-effect of making var x = async a(); resume x; valid if a() calls b() and b suspends.

Since copying, moving, or examining the contents of existing frames are not supported use cases, and awaiting your own frame is not allowed, I don't think this change breaks any existing use cases.

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label May 6, 2020
@Vexu Vexu added this to the 0.7.0 milestone May 6, 2020
@SpexGuy
Copy link
Contributor Author

SpexGuy commented May 6, 2020

Here's an example of the sort of optimization that is impossible:

fn a() u64 {
    return b(4);
}
fn b(val: u32) u32 {
    suspend { global_frame = @frame(); }
    return val;
}

In the optimal case, this would be optimized to

fn a() u64 {
    suspend { global_frame = @frame(); }
    return 4;
}

But this code is not equivalent with the current semantics. @frame() in the optimized example is a different type. Not only is it a different type, but it's an observably different type. In the second example, other code saying await @ptrCast(anyframe->u64, global_frame) is allowed. But in the first example, this is not valid. The frame has changed types in an observable way, so the compiler cannot ever make this optimization. But if @frame() is only guaranteed to return anyframe, and no guarantees are made about being able to cast it, then the compiler can start making these sorts of optimizations.

@rohlem
Copy link
Contributor

rohlem commented May 7, 2020

I played around with a couple of test cases. To document my own conclusions (5&6):

  1. Frames can be resume-ed, await-ed, or manually deallocated (if they were manually allocated).
  2. Per current semantics, there is only ever supposed to be (at most?) one awaiter per frame (instance lifetime).
  3. A frame's return value is only observable by its awaiter.
  4. Calling an async function as in b() is equivalent to await async b(). (The frame is stack-allocated and awaited by the enclosing caller function making it suspend on callee suspension).
  5. The proposed optimization is trivial (and mainly applicable only) for eliding (possibly after other optimizations) a directly sequential await async-pair (i.e. with no code executing in between). In almost all cases this will be scenario (4).
  6. A direct await async-pair implies that the enclosing function is our only awaiter, and also responsible for deallocating it. Therefore all other uses of the frame (via @frame()) are only interested in resuming it, which doesn't require type information (beyond anyframe). Therefore it's safe to change the type of @frame() in these cases.

Also, if the primary use case of @frame is only for resume-ing it, maybe having a second builtin @awaitableFrame()/@frameAwaitable() with full type info would be more ergonomic? That way @frame() could always return anyframe, without optimization-dependent type changes (which will be sufficient most of the time).

@SpexGuy
Copy link
Contributor Author

SpexGuy commented May 7, 2020

That all makes sense. I think that because of the colorless function abstraction, direct async await pairs will be very common and should be easy to optimize.
Using a separate builtin @awaitableFrame()/@frameAwaitable() would allow the optimizer to realize more easily whether a function actually needs its own frame type for all calls, so this would be a good compromise. But their existence constrains the implementation of async wherever they are used. I'm not sure that there is any valid use case for these builtins, so if we could remove them entirely it would be a nice simplification.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented May 7, 2020

I think we can accomplish getting the frame explicitly into the function with a wrapper like this if it's really necessary:

fn asyncCallWithFrame(func: var, frame: *@Frame(func), other_args: var) void {
    _ = @asyncCall(asBytes(frame), func, .{frame} ++ other_args);
}

If there were a way to access the return location in a function, it could be made more ergonomic in the case of stack-allocated frames:

fn asyncCallWithFrame(func: var, other_args: var) @Frame(func) {
    const frame = @resultLocation();
    _ = @asyncCall(asBytes(frame), func, .{frame} ++ other_args);
}

That said, there is only one thing I can think of that you can do that would require an awaitable reference to your own frame, which is to spawn an async function that will await the function that spawned it.
This path is terrible and filled with footguns. Equivalent behavior can be accomplished without the footguns by just using the existing mechanics and not trying to be fancy. I'll try to illustrate why.
Here's an example where an async function attempts to spawn its own continuation:

fn frameAwareAsyncFn(frame: *@Frame(frameAwareAsyncFn), allocator: *Allocator) !void {
    try doStuffPart1();

    // kick off a continuation function that will await this one

    // Footgun: the frame must be heap allocated because the continuation function will
    // outlive this frame.
    const continuingFrame = try allocator.create(@Frame(doMoreStuff));
    // Footgun: If any `try` fails above this point, the caller must await this frame to
    // recover the error.  If the try succeeds, the async call will await this frame and
    // the caller must not.  This can potentially be resolved by the programmer with a
    // lot of work, but it turns into a huge mess of stuff that is essentially
    // reimplementing continuations at user level.

    _ = @asyncCall(asBytes(continuingFrame), doMoreStuff, .{allocator});

    // Footgun: somehow, `continuingFrame` must be awaited.  We cannot await it here
    // because it is waiting on us, so that would cause a circular await chain that would
    // never resume.  So now we have to store it in some external state that knows it
    // must await it from a different execution path.
    global_awaiters.enqueue(continuingFrame);
    
    try doStuffPart2();
}

fn doMoreStuff(prev_frame: anyframe->error{...}!void) !void {
    try doStuffPart3();
    // Footgun: If this fails, the caller must do the await, same as above.
    // If it succeeds, this function will await and the caller must not await.
    // Doing `defer await prev_frame` would be ideal, but prev_frame returns an error
    // and you can't put a `try` in a `defer`.
    
    try await prev_frame;
    
    try doStuffPart4();
}

The following code calls all of the doStuff parts with the same ordering guarantees, but is both clearer and less error prone.

fn doStuffTheEasyWay() !void {
    try doStuffPart1();
    var part3 = async doStuffPart3();
    var part2 = async doStuffPart2();
    (await part3) catch |err| { (await part2) catch {}; return err; };
    try await part2;
    try doStuffPart4();
}

@rohlem
Copy link
Contributor

rohlem commented May 7, 2020

Using a separate builtin @awaitableFrame()/@frameAwaitable() would allow the optimizer to realize more easily whether a function actually needs its own frame type for all calls

Note that casting the anyframe returned by @frame() to an awaitable type (its actual type or an anyframe->T) is still supported behaviour though, so I don't think this alone is much help.

In fact, since @as(anyframe->ResultType, @frame()) is an option, we wouldn't even need a second builtin. Although seeing how verbose that is, I'm not sure it's worth it - it really only makes await-ing @frame() a bit more obscure, and I don't think this is actually hiding a footgun, it's just that "usually anyframe is enough".

@SpexGuy
Copy link
Contributor Author

SpexGuy commented May 7, 2020

Sorry, I misunderstood. I interpreted the suggestion as "make downcasting @frame() and awaiting it UB, and introduce the new builtins for when you actually need an awaitable frame". If downcasting @frame() needs to remain valid then there's no real benefit to changing its return type to anyframe.
I don't think there is any use case where any variant of var x = @frame(); ... await x; is ever necessary, so I think awaiting the result of @frame() could safely be made UB, and thus its return type could be changed to anyframe with no casting guarantees.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented May 7, 2020

I'm becoming more and more convinced that being able to await @frame() is a footgun. Awaiting it from within the function that called it will always cause an infinite suspend (if it's not UB), and passing the frame to some other execution path to await makes assumptions that

  • the other execution path can never fail to store the frame
  • the caller will never await the frame, unless storing the frame fails and the caller finds out through magic.
  • the caller will allocate the frame in some place that has a lifetime outliving the await in the other execution path.

Alternatively and equivalently, the caller could async call the function with a heap-allocated frame and pass that frame to the external await execution path. This can even be wrapped if you always want to set things up that way. Now you have much more explicit guarantees:

  • the caller of an async function knows they must await or pass the frame to some other code to await. This is always true in all Zig code.
  • if the code path to set up the await fails, the error handler can and must await it.
  • the caller knows that the lifetime of the frame they use must meet or exceed the await call that they will set up.

More generally, if awaiting the frame is invalid, it gives the nice property that any call to async must be matched to an await by the caller, just like allocation. It doesn't make sense for an alloc function to schedule it's own free, and it doesn't make sense for an async function to schedule its own await.

usually anyframe is enough

If there are any examples where anyframe is not enough from a language design standpoint, I would like to see them. I am not convinced that they exist.

@rohlem
Copy link
Contributor

rohlem commented May 8, 2020

I'm not sure the typing change would be all that helpful for this particular optimization, since inlining already makes a local copy to modify it in aggressive optimizations.

I think a separate issue/proposal would be a better place to discuss that change. In short I would support returning anyframe, but don't like the idea of declaring the downcast (and await-ing the casted value) "undefined behaviour", since that would add a new footgun to the most obvious way of implementing that flow/structure.

I think Andrew knows the implementation best, so he can provide input on whether or not the typing change to @frame() in itself (or what changes beyond that) would impact inlining opportunities.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020
@ghost
Copy link

ghost commented Nov 27, 2020

I think it's helpful to have an explicit statement of intent, so here's my interpretation, I will edit this if incorrect.

Key idea: async invocations produce awaitable handles, @frame() produces resumable handles. These roles are distinct, and by default should not be interchanged, unless the programmer has audited the code and deemed it to be safe. This means that:

  • async invocations should return @Frame(func) (status quo)
  • *@Frame(func) should coerce to anyframe->T (status quo)
  • @frame() should return anyframe, not *@Frame(func) (change)
  • anyframe->T should be awaitable, anyframe should not (status quo)
  • anyframe should be resumable, anyframe->T should not (change)
  • anyframe should not coerce to anyframe->T, although explicit @ptrCasting is allowed (status quo)
  • anyframe->T should not coerce to anyframe, see above (change)

Additionally, in safe modes, we include a four-state field in a frame to indicate whether it is:

  • running
  • suspended, resumable
  • suspended, awaiting
  • completed

In unsafe modes, the first three are collapsed, and it becomes a flag (for await suspension elision, nosuspend and the like).

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@andrewrk andrewrk modified the milestones: 0.14.0, 0.16.0 Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

4 participants