Stream: git-wasmtime

Topic: wasmtime / issue #11701 Weird behavior of wasi::filesyste...


view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 12:49):

vigoo opened issue #11701:

I would like to report a very weird issue that I ran into while debugging this through Golem (which uses wasmtime under the hood). Through the investigation I realized that the issue can be reproduced purely with wasmtime, even with the latest published version.

However, the reproducer is a bit fragile:

Even compiling to debug vs release seems to affect which of the above two outcome happens.

Test Case

I'm attaching a cargo-component crate that is reproducing me both of the above cases with rustc 1.89 and cargo-component 0.21.1.

Steps to Reproduce

Reproducing the "error 55" case with debug build:

Output:

Trying to create directory /tmp/py/modules/0/mytest/__pycache__
Finished creating directory /tmp/py/modules/0/mytest/__pycache__
Ok(())
Creating files
Ok(())
Ok(())
Ok(())
Ok(())
Removing all
print_tree "/tmp/py/modules/0"
๐Ÿ“ mytest
print_tree "/tmp/py/modules/0/mytest"
  ๐Ÿ“„ __init__.py
  ๐Ÿ“ __pycache__
print_tree "/tmp/py/modules/0/mytest/__pycache__"
    ๐Ÿ“„ mymodule.rustpython-01.pyc
    ๐Ÿ“„ __init__.rustpython-01.pyc
  ๐Ÿ“„ mymodule.py
Err("Directory not empty (os error 55)")
()

Reproducing the infinite loop with a release build:

Output:

Trying to create directory /tmp/py/modules/0/mytest/__pycache__
Finished creating directory /tmp/py/modules/0/mytest/__pycache__
Ok(())
Creating files
Ok(())
Ok(())
Ok(())
Ok(())
Removing all
print_tree "/tmp/py/modules/0"
๐Ÿ“ mytest
print_tree "/tmp/py/modules/0/mytest"
  ๐Ÿ“„ __init__.py
  ๐Ÿ“ __pycache__
print_tree "/tmp/py/modules/0/mytest/__pycache__"
    ๐Ÿ“„ mymodule.rustpython-01.pyc
    ๐Ÿ“„ __init__.rustpython-01.pyc
  ๐Ÿ“„ mymodule.py

and hanging here.

Note that even removing things like prints from the code can make it rather fail than hang, so I'm not sure how stable this reproducer is on other machines. http://url
Also the attached code contains many other functions which are used in different tests originally - I left them because removing them made the "hanging case" irreproducible for me.

I can attach the actual two WASMs if it helps.

Expected Results

The directory structure deleted and the guest returns without error.

Versions and Environment

Wasmtime version: tried with 33.0.0 (what we use internally) and the latest published (36.0.2)

Operating system: Darwin Kernel Version 24.6.0
Architecture: arm64

reproducer.zip

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 12:49):

vigoo added the bug label to Issue #11701.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 12:55):

vigoo commented on issue #11701:

file-server-debug.wasm.zip
file-server-release.wasm.zip

The WASMs for the above two cases, for me reproducing the two different bad behaviors.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 18:52):

alexcrichton commented on issue #11701:

Ok this is kind of a wild bug. My understanding at this point is that the true bug lies here for the wasip1 target and here for the wasip2 target. I cannot yet explain the difference in --debug and --release, nor can I explain why this appears to be platform-specific. Some various learnings otherwise:

In the meantime though, what to do about this? Unfortunately I think we're in a bit of a problematic situation. I'm not sure how to do some sort of host-side change to fix this with the WASIp1 fd_readdir API. That's unfortunately what's required to get fixed here since the Rust standard library is going through WASIp1 for reading directories, which is implemented through the WASIp1-to-WASIp2 adapter. This means we've got the two lines to fix in Wasmtime at the start of this (one in native on in content). The difficulty is that fd_readdir, as specified, would effectively require buffering the entire directory's contents within the WASIp1-to-WASIp2 adapter which we basically can't do since dynamic allocations aren't possible. Without this buffering behavior I don't believe we can implement what's necessary for fd_readdir here which is to read the directory at most once from a continuous stream of entries.

Well, that's at least as far as I've gotten so far on this.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:04):

alexcrichton commented on issue #11701:

Ok well further staring found https://github.com/bytecodealliance/wasmtime/pull/11702 which is the cause of the release-mode-vs-debug-mode difference. With that I'm confident now that the only issue is the broken implementations of fd_readdir in this repo.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:34):

alexcrichton commented on issue #11701:

cc @vados-cosmonic and @sunfishcode I'm curious as co-champions of wasi-filesystem to get your take on this. My question is about WASIp1, which if y'all would rather not care about feel free to ignore this. Specifically the fd_readdir function -- how should an implementation deal with the fact that between invocations of fd_readdir mutations to the directory might be made? I can personally think of two somewhat-viable paths forward:

Hm ok different question: as co-champions of wasi-filesystem how do y'all feel about declaring this API as dead-and-broken? I realize WASIp1 is sort of already in that state but it would be useful to have this on-record somewhere if only in an issue or something like that.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:45):

alexcrichton added the wasi:impl label to Issue #11701.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:46):

alexcrichton commented on issue #11701:

@vigoo in the meantime if you're interested in getting this fixed in the near-term I think the quickest fix will be "don't use std::fs::remove_dir_all" if that's possible. If that's in the bowels of some other crate you're using, however, the next-quickest fix would be to propose a change to Rust's libstd, but that's a pretty big step down in terms of "quickest"

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:52):

bjorn3 commented on issue #11701:

fd_readdir can be implemented directly in terms of getdents when not using the wasip1 to wasip2 shim, right? The d_off field for getdents is directly equivalent to d_next for fd_readdir. Maybe wasip2 could add a cookie field populated by the d_off field of getdents to directory-entry and then the wasip1 to wasip2 shim can use this cookie field to seek to the right entry in directory-entry-stream rather than using an integer index within the directory as d_next?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 19:53):

bjorn3 edited a comment on issue #11701:

fd_readdir can be implemented directly in terms of getdents when not using the wasip1 to wasip2 shim, right? The d_off field for getdents is directly equivalent to d_next for fd_readdir. Maybe wasip2 could add a cookie field populated by the d_off field of getdents to directory-entry and then the wasip1 to wasip2 shim can use this cookie field to seek to the right entry in directory-entry-stream rather than using an integer index within the directory as d_next? Or maybe the entry name could be hashed and used as cookie as temporary workaround. Would probably cause issue for hash collisions though.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2025 at 20:26):

vigoo commented on issue #11701:

@vigoo in the meantime if you're interested in getting this fixed in the near-term I think the quickest fix will be "don't use std::fs::remove_dir_all" if that's possible. If that's in the bowels of some other crate you're using, however, the next-quickest fix would be to propose a change to Rust's libstd, but that's a pretty big step down in terms of "quickest"

Thanks for looking into it so quickly! I did that as a workaround already (not using std::fs::remove_dir_all in that piece of code that triggered my investigation ) although of course I cannot guarantee our users will never use it.

Most importantly I just wanted to let you know about the issue.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 17 2025 at 00:46):

alexcrichton commented on issue #11701:

@bjorn3 it looks like getdents is Linux-specific which is already one major blocker, but another is that once you've read d_off there's no guarantee the file there isn't deleted. If it's deleted, for example, then the iterator would be truncated without visiting anything else since the next seek wouldn't find anything.

Effectively the WASIp1 fd_readdir differs from getdents in that it's not stateful where you pass in a cookie and at least my read of it is that you can arbitrarily seek around when reading a directory. With the lack of state, however, it means that we can't maintain a single object on the host that we're following a stream of. I don't know how we can take an arbitrary cookie and seek the actual stream in the face of modifications between calls to fd_readdir. WASIp{2,3} are much easier here since they return a stream/iterator and nothing else -- no seeking allowed.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 17 2025 at 16:44):

vados-cosmonic commented on issue #11701:

A bit late but with regards to this note:

Hm ok different question: as co-champions of wasi-filesystem how do y'all feel about declaring this API as dead-and-broken? I realize WASIp1 is sort of already in that state but it would be useful to have this on-record somewhere if only in an issue or something like that.

This certainly seems like the right first step -- at the very least this is a big enough footgun that it should be documented somewhere.

Looking at the P1 interface I can't see a way to solve this that others haven't mentioned already here. One thing that I was thinking about, would it be possible to use some bits of the cookie to store some state?


Last updated: Dec 06 2025 at 07:03 UTC