Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial async work #279

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

martindevans
Copy link
Contributor

This PR adds support for async-wasm (see async.h).

Async support was added to the C API in bytecodealliance/wasmtime#7106 which has not yet made it into a main wasmtime release, so testing this will require new DLLs until that's released.

Main Changes:

Config

Changes here are quite basic, just some new config properties to enable and configure async mode (e.g. WithAsync(bool)). Once async has been enabled you must use Linker.InstantiateAsync to create a module.

Linker

Added InstantiateAsync which creates an AsyncInstance. Also added generation for all the DefineAsyncFunction overloads.

AsyncInstance

An Instance which was created in async mode. This only exposes the new async calls, so it can't be used wrong (e.g. trying to call in non-async mode).

AsyncFunction

A Function which is async. This can currently only be invoked by wrapping it, we'll probably want to introduce an equivalent of object? Invoke(ReadOnlySpan<ValueBox> arguments) later.

@martindevans
Copy link
Contributor Author

martindevans commented Oct 18, 2023

CI is failing since I didn't include updated binaries. I have no idea how the macos-release one has managed to pass :/

Copy link
Contributor

@kpreisser kpreisser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I haven't yet looked at all changes in detail, but it seems when I try to run the async tests with a debugger (Visual Studio) and set breakpoints in the async callback, it will cause the process to crash with an Access Violation on Windows when wasmtime_call_future_poll is called, although I don't see anything obvious in the code that would indicate e.g. a memory issue.

The docs of wasmtime_linker_define_async_func say:

The callback cb will be invoked on another stack (fiber for Windows).

I'm not familiar with fibers on Windows, but it looks like they can be changing the thread's stack when invoking such a function. Do you know if the .NET CLR is compatible with that? Otherwise, I'm not sure if it's possible to correctly implement support for async Wasmtime functions in wasmtime-dotnet.

CI is failing since I didn't include updated binaries. I have no idea how the macos-release one has managed to pass :/

I think by default wasmtime-dotnet will download the dev release of Wasmtime, which already includes the async support, so the failures probably shouldn't be caused by old wasmtime binaries.

Thanks!

return trap;
}

await Task.Yield();
Copy link
Contributor

@kpreisser kpreisser Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will produce high CPU usage by calling Task.Yield() in a loop in order to check when the future has completed. I haven't looked much into how the new async support works in the Wasmtime C API, but maybe is there a different way of get notified when a future might have completed?

From a quick look at the async.h documentation, I was thinking that a returned wasmtime_call_future_t might complete once one of the async callback functions has completed. Maybe it would be possible to trigger checking of the wasmtime_call_future_t once one of the wasmtime_async_continuation_t is set to completed (as I guess the call of wasmtime_call_future_poll will trigger Wasmtime to call the still active wasmtime_func_async_continuation_callback_t callbacks).

@kpreisser
Copy link
Contributor

kpreisser commented Oct 19, 2023

Do you know if the .NET CLR is compatible with that? Otherwise, I'm not sure if it's possible to correctly implement support for async Wasmtime functions in wasmtime-dotnet.

I found this on the .NET documentation on Managed and unmanaged threading in Windows:

The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.

Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet.

@martindevans
Copy link
Contributor Author

The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.

Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet.

That would be very unfortunate, but it does look like you're right.

I'm a little bit surprised my tests aren't flakey. I guess since they're only ever running one async call at a time there's no real stack swapping going on.

Co-authored-by: Konstantin Preißer <kpreisser@users.noreply.github.com>
@martindevans
Copy link
Contributor Author

it will cause the process to crash with an Access Violation on Windows when wasmtime_call_future_poll is called

Does it reliably crash for you when you do this? I've been using the debugger while developing this and have had the tests passing.

@martindevans
Copy link
Contributor Author

I've just added a test with multiple async calls and that seems to crash every time (SEHException).

@kpreisser
Copy link
Contributor

kpreisser commented Oct 19, 2023

Does it reliably crash for you when you do this?

Yes, when I set a breakpoint in the async callback in the InvokeAsyncNoArgsNoResults test, like this:
grafik

Then when I use "Debug" on this test, the breakpoint doesn't get hit, and it always crashes with an SEHException and then an access violation:

Exception thrown: 'System.Runtime.InteropServices.SEHException' in Unknown Module.
The program '[15980] testhost.exe' has exited with code 3221225477 (0xc0000005) 'Access violation'.

When I remove the breakpoint and use Debug on this test, it doesn't crash; which I think is probably a result of the async callback being called using fibers.

@martindevans
Copy link
Contributor Author

How do we want to proceed with this?

We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.

@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?

@kpreisser
Copy link
Contributor

We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.

I tried using VS Code (with the C# Dev Kit Extension) on Ubuntu 22.04 with .NET SDK 7.0.110, and then set two breakpoints in the async callback in the InvokeAsyncNoArgsNoResults like this, and then used "Debug Test" on that test:

grafik

Then, the debugger seemed to pause but didn't stop at the breakpoint, and it couldn't show the current thread's call stack (single stepping failed as well):

grafik

When continuing execution, the second breakpoint was then hit (as that code is run by a regular worker thread of the .NET ThreadPool), and the test passed afterwards.
grafik

So maybe at least the debug experience on Linux also doesn't fully support the stack switching when the async callback is called.

@peterhuene
Copy link
Member

@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?

I don't think moving away from native fibers on Windows is on anyone's radar to implement and I can't recall if there were a technical reason we are using the native fiber implementation on Windows or if it was used simply because it was available at the time. @alexcrichton might recall as the implementer.

I do wonder if the CLR might not appreciate the stack being switched out from under it even with a custom fiber implementation; I think we'd have to dig deeper into exactly what's driving the limitation from the CLR side before using the limitation as justification for moving away from the native fiber implementation on Windows.

@alexcrichton
Copy link
Member

I was personally under the impression that there's no safe and future-compatible way to do stack switching on Windows except for the native fibers API that's provided in userspace. Given that it's not even an option to move away from native fibers because we want to correctly operate on Windows.

That being said I'm no Windows expert, so if this assumption is false then we can definitely move away from native fibers. I don't know what, if anything, other projects are doing with respect to native stack switching on Windows.

@kpreisser
Copy link
Contributor

kpreisser commented Oct 23, 2023

I don't have much knowledge in low-level functionality like this, but maybe one reason that the .NET threading model doesn't support fibers on Windows is due to how the GC works regarding object references on the stack.

For example, when running the following code based on this PR on Windows:

void Run()
{
    using var config = new Config().WithAsync(true);
    using var engine = new Engine(config);

    using var module = Module.FromText(
        engine,
        "hello",
        """
        (module 
            (func $hello (import "" "hello") (result i32))
            (func (export "run") (result i32)
                call $hello
            )
        )
        """);

    using var linker = new Linker(engine);
    using var store = new Store(engine);

    var cleanUpThread = new Thread(() =>
    {
        for (int i = 0; i < 10; i++)
        {
            Thread.Sleep(100);
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
    });
    cleanUpThread.Start();

    linker.DefineFunction(
        "",
        "hello",
        () =>
        {
            cleanUpThread.Join();
            Console.WriteLine("Callback 1! Wait completed.");
            return 12345;
        });

    var instance = linker.InstantiateAsync(store, module).GetAwaiter().GetResult();
    var run = instance.GetFunction("run")!.WrapFunc<int>()!;
    
    var obj = new MyCollectibleObj();

    var runTask = run();
    int result = runTask.GetAwaiter().GetResult();
    Console.WriteLine("After await runTask. MyCollectibleObj shouldn't have been collected until now.");

    GC.KeepAlive(obj);
}

Run();

class MyCollectibleObj
{
    public MyCollectibleObj()
    {
        Console.WriteLine("Created MyCollectibleObj.");
    }
    ~MyCollectibleObj()
    {
        Console.WriteLine("Collected MyCollectibleObj!");
    }
}

then even without running under a debugger, the app will crash most of the time with various errors (like "Fatal error. Internal CLR error.", "AccessViolationException"), and from the console output you can see that the GC has collected the MyCollectibleObj instance even though the code keeps it alive (on the stack) during the call, and so shouldn't have been collected until awaiting the runTask. It seems due to the stack switch, the GC isn't able to track that the object is still alive.

I can't reproduce this on Ubuntu 22.04, so maybe the GC in .NET (or the stack switching) is working differently there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants