dbaeumer added the bug label to Issue #8392.
dbaeumer opened issue #8392:
Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.
Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.
Test Case
Clone https://github.com/dbaeumer/wasm-threads.git
Steps to Reproduce
npm run build:wasm
to build the wasm filenpm run wasmtime
to execute the wasm file in wasmtimeExpected Results
The application doesn't hang on the
fs::File::open
callI expect the same result as executing:
npm run build:os
npm run run:os
Actual Results
The application does hang.
Versions and Environment
Rust:
Default host: x86_64-unknown-linux-gnu rustup home: /xxxx/.rustup installed targets for active toolchain -------------------------------------- wasm32-unknown-unknown wasm32-wasi wasm32-wasi-preview1-threads x86_64-unknown-linux-gnu active toolchain ---------------- stable-x86_64-unknown-linux-gnu (default) rustc 1.77.2 (25ef9e3d8 2024-04-09)
Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)
Operating system:
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
dbaeumer edited issue #8392:
Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.
Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.
Test Case
Clone https://github.com/dbaeumer/wasm-threads.git
Steps to Reproduce
npm run build:wasm
to build the wasm filenpm run run:wasmtime
to execute the wasm file in wasmtimeExpected Results
The application doesn't hang on the
fs::File::open
callI expect the same result as executing:
npm run build:os
npm run run:os
Actual Results
The application does hang.
Versions and Environment
Rust:
Default host: x86_64-unknown-linux-gnu rustup home: /xxxx/.rustup installed targets for active toolchain -------------------------------------- wasm32-unknown-unknown wasm32-wasi wasm32-wasi-preview1-threads x86_64-unknown-linux-gnu active toolchain ---------------- stable-x86_64-unknown-linux-gnu (default) rustc 1.77.2 (25ef9e3d8 2024-04-09)
Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)
Operating system:
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
alexcrichton commented on issue #8392:
Using gdb/logging it looks like the main thread is blocked in
memory.atomic.wait32(addr=0x104198, expected=0x80000002, timeout=None)
. Given that I'm not sure that the issue is in Wasmtime and this looks like a wasi-libc issue perhaps? (e.g. something perhaps related to locking in the initialization of files and such.
tschneidereit commented on issue #8392:
@abrown this seems relevant to your interests :)
abrown commented on issue #8392:
I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:
When running with
WASMTIME_LOG=wasmtime_runtime::threads=trace
, I see onememory.atomic.wait
that is not corresponded elsewhere:2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)
The stack trace at this instruction looks like:
__wait __lock internal_register_preopened_fd_unlocked __wasilibc_populate_preopens __wasilibc_find_abspath __wasilibc_find_relpath_alloc __wasilibc_find_relpath std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0 std::sys::pal::wasi::fs::File::open::h57333908bc41373d std::fs::OpenOptions::_open::h0ba6508e7f3dab94 std::fs::OpenOptions::open::h917f7ccf95a22000 std::fs::File::open::h2bd1932e28feed10 wasm_threads::main::hd67815b1e283a1b0 core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8 std::rt::lang_start::{{closure}}::h79de5d68ac140761 std::rt::lang_start_internal::h301b28581f6245a7 std::rt::lang_start::h8217158c4515bb59 __main_void _start
This points to a problem in
internal_register_preopened_fd_unlocked
. From the file name, it would appear that no locking should happen here because its parent,__wasilibc_populate_preopens
, has already acquired the lock. But, in fact,resize
attempts to reacquire it despite it being held in the parent.Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in
resize
since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to usecargo
with a modified version of wasi-libc?
abrown edited a comment on issue #8392:
I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:
When running with
WASMTIME_LOG=wasmtime_runtime::threads=trace
, I see onememory.atomic.wait
that is not corresponded elsewhere:2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)
The stack trace at this instruction looks like:
__wait __lock internal_register_preopened_fd_unlocked __wasilibc_populate_preopens __wasilibc_find_abspath __wasilibc_find_relpath_alloc __wasilibc_find_relpath std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0 std::sys::pal::wasi::fs::File::open::h57333908bc41373d std::fs::OpenOptions::_open::h0ba6508e7f3dab94 std::fs::OpenOptions::open::h917f7ccf95a22000 std::fs::File::open::h2bd1932e28feed10 wasm_threads::main::hd67815b1e283a1b0 core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8 std::rt::lang_start::{{closure}}::h79de5d68ac140761 std::rt::lang_start_internal::h301b28581f6245a7 std::rt::lang_start::h8217158c4515bb59 __main_void _start
This points to a problem in
internal_register_preopened_fd_unlocked
. From the file name, it would appear that no locking should happen here because its parent,__wasilibc_populate_preopens
, has already acquired the lock. But, in fact,resize
attempts to reacquire it despite it being held in the parent.Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in
resize
since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to usecargo
with a modified version of wasi-libc?=====
EDIT: scratch the Git blame analysis; looking at this more, I think this was actually introduced in a refactoring in #408. @sunfishcode, it probably wasn't clear that
resize
retakes the lock? But it should be fine to remove it, right?
abrown edited a comment on issue #8392:
I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:
When running with
WASMTIME_LOG=wasmtime_runtime::threads=trace
, I see onememory.atomic.wait
that is not corresponded elsewhere:2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)
The stack trace at this instruction looks like:
__wait __lock internal_register_preopened_fd_unlocked __wasilibc_populate_preopens __wasilibc_find_abspath __wasilibc_find_relpath_alloc __wasilibc_find_relpath std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0 std::sys::pal::wasi::fs::File::open::h57333908bc41373d std::fs::OpenOptions::_open::h0ba6508e7f3dab94 std::fs::OpenOptions::open::h917f7ccf95a22000 std::fs::File::open::h2bd1932e28feed10 wasm_threads::main::hd67815b1e283a1b0 core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8 std::rt::lang_start::{{closure}}::h79de5d68ac140761 std::rt::lang_start_internal::h301b28581f6245a7 std::rt::lang_start::h8217158c4515bb59 __main_void _start
This points to a problem in
internal_register_preopened_fd_unlocked
. From the file name, it would appear that no locking should happen here because its parent,__wasilibc_populate_preopens
, has already acquired the lock. But, in fact,resize
attempts to reacquire it despite it being held in the parent.Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in
resize
since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to usecargo
with a modified version of wasi-libc?
EDIT: scratch the Git blame analysis; looking at this more, I think this was actually introduced in a refactoring in #408. @sunfishcode, it probably wasn't clear that
resize
retakes the lock? But it should be fine to remove it, right?
alexcrichton commented on issue #8392:
@alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?
The "easiest" way is perhaps to:
- Check-out wasi-sdk
- Update/edit the wasi-libc submodule with your changes
- Build wasi-sdk
- Set
WASI_SDK_PATH
and build the Rust targetYou can probably also finagle this with various sysroot flags and Rust configuration as well and get away with only building wasi-libc if you'd like to.
abrown commented on issue #8392:
Ok, here's how I verified that WebAssembly/wasi-libc#491 actually fixes this issue:
- built wasi-libc with threads support:
make THREAD_MODEL=posix
- in each of the
sysroot/*
directories, I had to "rename" the target to what Rust believes it to be:ln -s wasm32-wasi-threads wasm32-wasip1-threads
with the rust repository cloned, configure
config.toml
to point to the modifiedwasi-libc
:
toml lld = true [target.wasm32-wasip1-threads] wasi-root = ".../wasi-libc/sysroot"
build
rustc
:./x.py build --stage 1 --target wasm32-wasip1-threads
- inform
rustup
of the new compiler:rustup toolchain link stage1 build/x86_64-unknown-linux-gnu/stage1
- with @dbaeumer's repository cloned, build the reproducer with the modified
rustc
andwasi-libc
:rustc +stage1 -v --target=wasm32-wasip1-threads src/main.rs
- run the compiled reproducer with any recent Wasmtime:
console $ wasmtime --dir .::. --wasi threads=y --wasm threads=y main.wasm Before file open After file open hi number 1 from the spawned thread! hi number 2 from the spawned thread! hi number 3 from the spawned thread! hi number 4 from the spawned thread! hi number 5 from the spawned thread! hi number 6 from the spawned thread! hi number 7 from the spawned thread! hi number 8 from the spawned thread! hi number 9 from the spawned thread! Thread finished
Having run these steps, I'm a bit more confident that WebAssembly/wasi-libc#491 does not miss anything and, once that merges, I propose we close this.
abrown closed issue #8392:
Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.
Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.
Test Case
Clone https://github.com/dbaeumer/wasm-threads.git
Steps to Reproduce
npm run build:wasm
to build the wasm filenpm run run:wasmtime
to execute the wasm file in wasmtimeExpected Results
The application doesn't hang on the
fs::File::open
callI expect the same result as executing:
npm run build:os
npm run run:os
Actual Results
The application does hang.
Versions and Environment
Rust:
Default host: x86_64-unknown-linux-gnu rustup home: /xxxx/.rustup installed targets for active toolchain -------------------------------------- wasm32-unknown-unknown wasm32-wasi wasm32-wasi-preview1-threads x86_64-unknown-linux-gnu active toolchain ---------------- stable-x86_64-unknown-linux-gnu (default) rustc 1.77.2 (25ef9e3d8 2024-04-09)
Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)
Operating system:
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
Last updated: Dec 23 2024 at 12:05 UTC