wasmtime / issue #9667 wasmtime-wasi: Unexpected tcp stre... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #9667 wasmtime-wasi: Unexpected tcp stre...

Wasmtime GitHub notifications bot (Nov 28 2024 at 09:05):

Heap-Hop edited issue #9667:

https://github.com/bytecodealliance/wasmtime/blob/67674881db5fbdbba8594feb52655aaa351a5f77/crates/wasi/src/tcp.rs#L719-L723

Returning 0 for std::io::ErrorKind::WouldBlock causes downstream to interpret it as a closed stream, see:
https://github.com/yoshuawuyts/wstd/issues/25#issuecomment-2493294087.
https://github.com/yoshuawuyts/wstd/blob/5ce367add5e7bcb569b6487453cb9ba94468dc99/src/io/copy.rs#L12

This is also found in:
https://github.com/bytecodealliance/wasmtime/blob/67674881db5fbdbba8594feb52655aaa351a5f77/crates/test-programs/src/bin/preview2_tcp_streams.rs#L17-L22

~~Given that the wit files already include many would-block errors, would it make sense to extend stream-error to include a would-block?~~

Wasmtime GitHub notifications bot (Nov 28 2024 at 14:29):

Heap-Hop commented on issue #9667:

@pchickey https://github.com/bytecodealliance/wasmtime/actions/runs/12070585313/job/33660435175?pr=9691#step:18:419

Wasmtime GitHub notifications bot (Nov 28 2024 at 16:45):

pchickey commented on issue #9667:

Thanks! Its a holiday weekend here so I’ll get to this next week.

Wasmtime GitHub notifications bot (Jan 22 2025 at 18:09):

alexcrichton commented on issue #9667:

I apologize it's taken me quite some time to take a look at this. Now that I've read over this and https://github.com/bytecodealliance/wasmtime/pull/9691 I personally don't think there's a bug in the wasmtime-wasi implementation here. I don't think WASI can reasonably provide super strict guarantees about exactly when things become ready and exactly the states before/after reads. Attempting to do this might not even be feasible in the face of an implementation across multiple platforms.

For example the test in https://github.com/bytecodealliance/wasmtime/pull/9691 looks like this:
        use test_programs::wasi::clocks::monotonic_clock::subscribe_duration;
        let timeout_100ms = 100_000_000;

        // Send some data to the server
        client.output.blocking_write_and_flush(b"Hi!").unwrap();

        server.input.subscribe().block();
        let res = server.input.read(512).unwrap();
        assert_eq!(res, b"Hi!", "Expected to receive data");

        // Don't send any data

        let res = server
            .input
            .subscribe()
            .block_until(&subscribe_duration(timeout_100ms));
        assert!(res.is_err(), "Expected to time out cause no data was sent");
I don't believe that this can be made to work reliably. Here after the initial "Hi!" message is ready it's not actually known whether the stream is readable or not. Determining this would involve actually consulting the kernel with some sort of syscall or something like that. It's a pretty important perf optimization that in async servers you don't do anything with the kernel until absolutely necessary.

For example here the kernel was consulted during the first server.input.subscribe().block(). At this point the stream is considered readable. The issued read then again consults the kernel and reads out "Hi!". At this point though the stream is notably considered still readable by Tokio and Wasmtime. It's not known whether this was a "short read" or whether more data is coming soon. This is why the next subscribe() immediately says "yes this is readable". Doing otherwise would require unnecessarily consulting the kernel.

Working robustly with async is certainly an art but personally I don't think that there's a bug here that needs fixing. I would personally think we can close https://github.com/bytecodealliance/wasmtime/pull/9691 and this issue as "working as intended".

Wasmtime GitHub notifications bot (Jan 22 2025 at 19:52):

pchickey commented on issue #9667:

I apologize I never got back to this after saying I would. I started a new job and have a whole host of new problems to work on, so I dropped some of the old ones.

Thanks Alex for digging into this and helping me understand why the existing behavior is there. My conclusion is that my specification of "a ready pollable implies input-stream.read will succeed" is too strong of a guarantee to implement efficiently on top of POSIX. In discussion with Alex, he suggested that only the behavior of read shows whether input was ready or not, and the pollable's readiness is a best-effort indicator that provides an optimization over busy-looping on read. I expect that we can easily accommodate the weakening of that guarantee in existing code (at least in wstd), and any existing binaries can't possibly be relying on that guarantee since its never been upheld by implementations. We will have to take that discussion out to the spec repo in order to nail it all down.

Wasmtime GitHub notifications bot (Jan 22 2025 at 20:30):

badeend commented on issue #9667:

I would personally think we can close (...) this issue as "working as intended".

For non-blocking APIs: agree. There's not much wasmtime can do (efficiently).

Maybe this could also justify the removal of starting an arbitrary-sized read in the FileInputStream implementation that was put in place to fulfill this guarantee?

For blocking APIs: there a few places where wasmtime blindly trusts that ready().await actually implies readiness. For example:

https://github.com/bytecodealliance/wasmtime/blob/ca9557646aac2d625a6d1d040e6caf065d0340f6/crates/wasi-io/src/streams.rs#L24-L29

In these cases wasmtime _is_ in a position to "fix" spurious wakeups by wrapping it in a loop.

Wasmtime GitHub notifications bot (Jan 22 2025 at 20:44):

alexcrichton commented on issue #9667:

Yeah I would agree that blocking_read should have a loop to handle situations like this

Wasmtime GitHub notifications bot (Jan 23 2025 at 07:56):

Heap-Hop commented on issue #9667:

This is why the next subscribe() immediately says "yes this is readable". Doing otherwise would require unnecessarily consulting the kernel.

@alexcrichton Thank you so much for the clear explanation, it resolved my confusion about why readable() in Tokio is not explicit.

It would be great if the blocking_ functions returned the expected results.

I would be happy to submit a PR to modify the blocking_ functions if no one else is currently working on it.

Wasmtime GitHub notifications bot (Jan 23 2025 at 17:00):

alexcrichton commented on issue #9667:

That'd be most welcome, thank you!

Wasmtime GitHub notifications bot (Jan 25 2025 at 07:52):

Heap-Hop commented on issue #9667:

#10113
Sorry, I haven't found an easy way to write a test that might block the process, as there isn’t a _until function for blocking_.
Is there an existing test case for this? The only approach I can think of is externally controlling the timeout for Wasmtime.

Last updated: Apr 18 2025 at 20:03 UTC