ospencer opened PR #8072 from ospencer:oscar/fix-fd-write-iovs
to bytecodealliance:main
:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
I noticed that the implementation offd_write
in the preview1 adapter only writes the first iov in the list passed. This change writes all of them. I'm happy to add a test for this, though I couldn't find any tests for the adapter. I tested this locally via wasm-tools.
ospencer requested alexcrichton for a review on PR #8072.
ospencer requested wasmtime-core-reviewers for a review on PR #8072.
Writing in a loop is incompatible with POSIX. POSIX mandates that each writev is an atomic operation. While it may only write the first n bytes, it has to write all n bytes at once without the possibility of any other write getting interleaved with it. If you want to write everything, you did have to copy all buffers into one large buffer and then write this large buffer, which may actually be slower than both the status quo and your PR. See also https://github.com/bytecodealliance/wasmtime/issues/8037#issuecomment-1973512565
alexcrichton commented on PR #8072:
To add another case I would be worried about in addition to what @bjorn3 already mentioned: in the PR as-is if you have 4 buffers and successfully write the first two but then fail writing the third then the fact that the first two were successfully written is lost. At that point you've got a number of successfully written bytes plus an error and it's not clear which should be returned since only one can be returned.
ospencer commented on PR #8072:
That makes sense. It looks like some implementations of
readv
/writev
do allocate a temporary large buffer in the case that the number of provided iovs exceeds the limit of the system, so that probably makes sense here too.I agree that the current implementation we have of
fd_write
is technically correct, but I think the (totally allowable) behavior change from older preview1 versions of wasmtime is a little surprising to less resilient code, shown by the couple of issues/PRs opened about this.I attempted to implement this so we could at least test it out/consider this approach, but I forgot that the adapter can't really use
Vec
s because of the panics. I'm not quite a seasoned-enough Rustacean to know how to do this without vecs in safe Rust, but if someone could give me a pointer that'd be much appreciated.
alexcrichton commented on PR #8072:
I agree it's unfortunate that this is causing issues, but IMO our hands are unfortunately tied here to the point that there's not much we can do. I think there's one thing we can do, which is to have a statically allocated or somewhere slice of bytes which the adapter copies into and then calls out to perform a single write. That would respect the semantics desired for languages and such here, at the cost of copying data. It's not clear to me when the copy is worth it vs when not, so I'm not sure how to best make such a decision.
The adapter has various bits and pieces of scratch space in its
State
and we can put a fixed-size buffer in there for copying if necessary.
I agree it's unfortunate that this is causing issues, but IMO our hands are unfortunately tied here to the point that there's not much we can do. I think there's one thing we can do, which is to have a statically allocated or somewhere slice of bytes which the adapter copies into and then calls out to perform a single write. That would respect the semantics desired for languages and such here, at the cost of copying data. It's not clear to me when the copy is worth it vs when not, so I'm not sure how to best make such a decision.
I hit this recently and thought I had made a mistake using the API; multiple iovecs are supported in at least three other engines. Though they are rare, supporting that functionality for application migration, even if the writes are not atomic, would be an improvement in the status quo and make the ecosystem more uniform. Given that I think these are rare, if we would like to make the POSIX atomicity guarantee, it seems like a copy into a temporary buffer would be a reasonable implementation--especially if the I/O is large, it's likely to be a small relative overhead.
alexcrichton commented on PR #8072:
I definitely agree that this can be a footgun, so I don't mean to discount that. That being said I continue to think "just call write twice" is not an acceptable solution. While that solves the footgun it means that if anyone tries to robustly handle errors/atomicity it's impossible to do so. I'm not sure @titzer if your issue is derivative of the same file changed in this PR, but this PR changes an adapter which converts WASIp1 to WASIp2. The WASIp1 APIs "looked like POSIX" and had its semantics, and callers may not even realize they're being wrapped to work in a WASIp2 world. In that sense callers with it can't be known in this context if callers with multiple iovecs care about atomicity or not.
This is why I mention that one possible solution is to copy some data. Perhaps when it's below a certain threshold or something like that it's copied into a different space and then issued as a single write. That doesn't have any atomicity issues and it's just a minor perf loss. I think it'd be reasonable to prototype that out and see what it would look like.
Last updated: Jan 24 2025 at 00:11 UTC