Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug when a source is closed while its poller is waiting #51

Closed
notgull opened this issue Nov 30, 2022 · 5 comments
Closed

Potential bug when a source is closed while its poller is waiting #51

notgull opened this issue Nov 30, 2022 · 5 comments

Comments

@notgull
Copy link
Member

notgull commented Nov 30, 2022

Take this code:

use polling::{Poller, Event};
use std::net::TcpStream;
use std::thread::{sleep, spawn};
use std::time::Duration;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a poller.
    let poller = Poller::new()?;

    // Register a TCP stream into it.
    let stream = TcpStream::connect(("google.com", 80))?;
    stream.set_nonblocking(true)?;
    let key = 1;
    poller.add(&stream, Event::readable(key))?;

    // Close the TCP stream on another thread.
    spawn(move || {
        sleep(Duration::from_secs(1));
        drop(stream);
    });

    // Wait for the stream to become readable.
    let mut events = Vec::new();
    poller.wait(&mut events, None)?;
    
    Ok(())
}

On Linux (Linux 5.15.0-53-generic #59-Ubuntu) this code simply hangs forever. I'm not sure how Windows reacts; I do not have a Windows machine accessible to me right now for testing, and Wine falls victim to a separate bug that I will be filing shortly (Edit: #52).

I realized that, once a file descriptor is registered into polling, there's no mechanism preventing that file descriptor from being deleted. While this is not an issue in most use cases (e.g. async-io), it may lead to buggy or even undefined behavior. The man page for epoll_wait says:

For a discussion of what may happen if a file descriptor in an epoll instance being monitored by epoll_wait() is closed in another thread, see select(2).

The man page for select says (emphasis mine):

If a file descriptor being monitored by select() is closed in another thread, the result is unspecified. [...] On Linux (and some other systems), closing the file descriptor in another thread has no effect on select(). In summary, any application that relies on a particular behavior in this scenario must be considered buggy.

I'm not entirely sure how kqueue or wepoll react in this case. In any case, I think that we need to decide:

  • Should this behavior be allowed? This originally came up during the question of porting portions of this code to rustix, which uses a system where the Epoll (in this case) either takes ownership of the source or borrows it for the duration of the polling process.
  • If this shouldn't be allowed, how should the new system be implemented? rustix's pattern doesn't translate that well to async-io's use case, since the Poller is 'static, ruling out borrowing, and the Async still needs access to the source for get_ref() and other functions. Perhaps there could be some kind of reference counting system, or a unique token.
  • If this should be allowed, we should have a single case of defined behavior when this happens. What should that case be?
@taiki-e
Copy link
Collaborator

taiki-e commented Nov 30, 2022

(Maybe related to the behavior I mentioned in smol-rs/async-io#64 (comment)?)

@notgull
Copy link
Member Author

notgull commented Dec 1, 2022

I finally got my mitts on a Windows PC, and it looks like it hangs there too.

@notgull
Copy link
Member Author

notgull commented Dec 28, 2022

I've tested it and it looks like, for most of our backends, this isn't an issue. It just keeps polling if an FD is dropped while polling. This may be an issue for the poll() backend, since, if an FD is closed and a new one is reopened, can lead to unexpected behavior.

@fogti
Copy link
Member

fogti commented Dec 30, 2022

for kqueue, this shouldn't be an issue because the kernel will deregister any event selectors (idk how they're called in this context) when the associated file descriptor is closed, afaik (at least that is what I conclude from docs + BSD kernel code)...
the same applies to epoll.
but if no other fd is registered, it will just hang, because nothing can wake it up (although depending on that is also a bad idea, because on some systems it might cause a wakeup, which might also depend on the registered event type).

@notgull
Copy link
Member Author

notgull commented Aug 10, 2023

As of #123 this is explicitly undefined behavior.

@notgull notgull closed this as completed Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants