-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan segfault while exiting #1439
Comments
Sigh, of course as soon as I click submit and walk away from the computer I got a new idea. The area I placed that delay in still had "self" in scope, which meant that while all drawing calls were done being made, the swapchain, pipelines, device, queue, and surface hadn't been dropped yet. So, I tried manually dropping self before I sent the notification to the other thread (using a channel under the hood), and this fixes the segfault. It seems like this crash only occurs when the main thread initiates the exit handlers and another thread is cleaning up the wgpu structures it had ownership over. I hope this helps narrow down the issue, but the good news is that my project no longer segfaults with this workaround, so it's not impacting me anymore. |
This is still affecting wgpu 0.9. I've also discovered that dropping a TextureView in similar circumstances can also trigger a similar crash. But, thankfully now, I think I've figured out everything I need to manually drop in my separate thread to prevent crashes. My codebase has changed since I last reported this. These workarounds are now in the main branch. If anyone is wanting to debug this, the location to disable my workarounds is here. You can search the source base for this issue URL to find the current location. Comment out the drop calls, then run any of the examples. I used the Upon running the app, close the window. It will sometimes break into the debugger inside of winit's signal handling code (but that doesn't cause an actual segfault when running the app). But, other times you'll a SIGSEGV inside of dropping a Buffer:
Or other times inside of dropping a BindGroup:
These crashes are occurring in the thread that those drop calls are written in, which is not the main winit thread. By moving the drops to happen before I tell the main thread to exit, it prevents the crashes from occurring. |
So in your case the resources are dropped after all the context/device are cleaned up? |
I don't believe so (Sorry, I have a lot of projects and this is some of my oldest code). In my setup, this thread does all the rendering. The main thread is winit event handling only. I don't believe any other thread has actual references to wgpu types. The other threads work on other types that eventually get rendered by this thread. After digging through to try to re-familiarize myself with this, I believe what's happening is that my signal to shutdown is finishing destroying the window itself. Since winit controls when the window itself goes away, and that happens in the main thread, I believe the window is sometimes being destroyed before the wgpu resources are destroyed. If that's true, is this actually a bug then? Can wgpu even protect me from myself for that situation? |
Great question! This is basically #1463 (cc @pythonesque ) |
It's "not a bug" in the sense that wgpu doesn't currently provide a safe interface to surface creation, so in theory all bets are off. In practice, I think it is one though, as it's very unlikely that an average wgpu user who needs windowing is going to be able to use it safely. Fortunately, I think the proposal @kvark linked (or a minor tweak of it) can solve the issue; basically, for safety, we just need to make sure the surface holds onto a reference to the window, preventing it from being destroyed until the wgpu context is destroyed (it's more complicated than that but that's the basic idea). |
I read through the linked issue, and it sounds great to me. While there's not a bug in wgpu's implementation, I think there's a bug in the documentation: the safety requirements aren't documented correctly. It currently only says that the raw window handle must be valid for creation. I'm a prime example of people that can't make the connection that the handle must also remain valid for the life of the surface. Would a PR modifying the documentation's safety sentence to add that note be worth doing? |
Sure, we'll be happy to have the documentation corrected! |
Replace uses of `call_unique` with uses of `call` and `call_or`, which becomes public. It's not clear when `call_unique` is correct to use, and avoiding a few numeric suffixes here and there isn't worth it.
Description
I have a segfaul on exit that is occurring while I have no user-code interacting with wgpu (as far as I can find). It's affecting my library, kludgine
Repro steps
Sadly, I can't reproduce any crashes or similar looking valgrind errors while using any of the examples.
redux
branch of kludgine. Technically the main branch exhibits this behavior, but the redux branch has been simplified significantly compared to main.valgrind -v ./target/debug/examples/simple
Expected vs observed behavior
I expect vulkan to shut down properly without crashing.
Extra materials
Here's a snippet from the valgrind report that I find interesting:
Many of the contexts it prints errors for stem from
destroy_swapchain_khr
. I can confirm that setting the winit ControlFlow to Exit is done after I exit the render loop in the thread that drives rendering. As far as I can tell, no code of mine is executing while the app is shutting down.I've tried inserting a delay after closing the window and telling winit to exit, and it doesn't matter. It's not due to a race condition of in-flight rendering code from what I can tell.
I can try to narrow this down further, but I'm not sure how to dive in further at this point.
Platform
The text was updated successfully, but these errors were encountered: