`poll_oneoff()` ambiguity · wasi

Stream: wasi

Topic: `poll_oneoff()` ambiguity

Nathaniel McCallum (Jun 16 2022 at 15:35):

@Dan Gohman (and everyone else) I want to make you aware what we have uncovered an ambiguity in the poll_oneoff() definition. It is currently blocking our work on mio/tokio and so we'd like to resolve this ASAP. The full details are available here: https://github.com/tokio-rs/mio/pull/1580#issuecomment-1157797095

fix(wasi): don't fail select() on empty subscriptions by haraldh · Pull Request #1580 · tokio-rs/mio

return Ok(()) rather than Err(EINVAL) Signed-off-by: Harald Hoyer harald@profian.com

Dan Gohman (Jun 16 2022 at 15:36):

@Nathaniel McCallum Thanks for raising this. I'll take a look.

Dan Gohman (Jun 16 2022 at 15:45):

My initial instinct is to say that we should return success immediately in that case. But I'm looking into a few more things.

Dan Gohman (Jun 16 2022 at 15:53):

The history here is that it used to have the block-infinitely behavior, but we removed that because WASI has no signals, so it's never useful to just block infinitely on an empty subscription list. But since poll is "wait for any event", the natural thing to do in an empty poll call is to wait indefinitely, EINVAL is returned, meaning "that particular request isn't supported".

Dan Gohman (Jun 16 2022 at 17:08):

Now I'm also considering saying that what Wasmtime is doing is correct, and that the proposed change in the PR is ok.

Dan Gohman (Jun 16 2022 at 17:58):

@Nathaniel McCallum Can you say more about what use case you have for calling mio::Poll::poll with no events?

Nathaniel McCallum (Jun 16 2022 at 17:59):

Honestly no. Harald is currently on paternity leave.

Dan Gohman (Jun 16 2022 at 18:00):

Having it return immediately isn't consistent with mio::Poll::poll's documented behavior, or its behavior on other platforms. But having it hang is useless on a platform where it can never be woken up.

Dan Gohman (Jun 16 2022 at 18:29):

I've posted what I know in the github thread. There isn't an obvious answer here, so I expect we'll need to look further up the stack to see what the code that calls this needs it to do.

Dan Gohman (Jun 16 2022 at 19:14):

@George Kulakowski One subtle difference between mutexes deadlocking and poll_oneoff deadlocking is that with mutexes, the problem is typically a mundane logic error, while with poll_oneoff, passing in zero events seems to indicate a mismatch between what the application is expecting the platform to do and what the platform could actually do.

George Kulakowski (Jun 16 2022 at 19:15):

I know that

Dan Gohman (Jun 16 2022 at 19:16):

Ah, I likely misunderstood your post.

George Kulakowski (Jun 16 2022 at 19:18):

Yeah, I realized belatedly for that reason it may not have been the best example. Analogies are tricky like that. typing more now

George Kulakowski (Jun 16 2022 at 20:14):

Putting here because putting on other people's github seems noisy

I've realized that (as you pointed out) that a big difference between the poll and mutex examples I gave is that they typically stem from different kinds of errors rooted in different expectations, and that's a distraction from the point I was attempting to make. I should have said looping forever ;p So lemme expound.

I agree that if I deploy some code which called wasi poll(on no events), I am going to want to some mechanism to tell me that my code can't make forward progress. This is a goal at the level of a human writing software, or a group of them: software engineers and devops.

What's not a priori clear to me is that the best place for that is the spec'd semantics of wasi poll_oneoff.

More subtly, I do believe there's often a lot of value in having straightforward semantics with no edge conditions in APIs like this. Having "at least one of the events occurred" as a postcondition of poll_oneoff is more powerful a tool to reason with than "at least one of the events occured OR no events were provided". I don't want other callers of poll_oneoff needing to assert that some particular postcondition (the "no events were provided" state) is unreachable.

In the other direction, I don't see what the cost is of providing an interface that lets you block forever. How's it distinguishable from looping forever? That's similarly "useless". What's so bad about it being possible to express "block forever"?

One answer to that comes back to the point about the lived experience of the people writing and deploying the software. And again I agree that their needs need addressing. I would argue that there's other ways to address it than specifying poll_oneoff(no events) to return immediately and handling that case in code.

One point is that, generally, I imagine people will want to observe or detect lack of forward progress, and that this is the domain of the runtime and its diagnostics. I want to be told if I'm wedged for any reason. Can blocking indefinitely via poll_oneoff be reported via one of those mechanisms?

Finally I acknowledge that maybe the ship has sailed on the semantics of poll_oneoff. I don't want to suggest relitigating that if it's not appropriate. I'll add that I certainly can't tell from the documentation that you intended to remove the block-indefinitely semantics, and that I clearly have a philosophy about this sort of thing that would have taken me to a different conclusion :)

Most spicy part: One aspect of that philosophy: I keyed onto this discussion on the first place because of specific phrasing around the call to the API being "useless". My experience has been that attempting to detect that a particular call, or sequence of calls, to a low level system interface is "useless" or not is a bit fraught, not always the best investment on increasing the quality of the experience of your clients, and a bit of a slippery slope, compared to approaches that think about these experiences from the whole lifecycle of those people writing and shipping code onto your platform.

(To explain to the admittedly strained mutex analogy, the mutex API itself may not report a deadlock condition, but instead the runtime may provide some other (maybe optional!) out-of-band mechanism to report on it, like some deadlock-detecting mode that I can explicitly turn on knowing that there's a performance cost or whatever, and that I may be informed via a trap or a log or something more indirect. Rather than rewriting all my locking code to check, at every single mutex lock guard, that no deadlock is possible. Doing it that way would be invasive and expensive.)

Last updated: Apr 10 2025 at 05:03 UTC