Stream: git-wasmtime

Topic: wasmtime / issue #8392 Wasmtime hangs when doing file I/O...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 09:17):

dbaeumer added the bug label to Issue #8392.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 09:17):

dbaeumer opened issue #8392:

Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.

Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.

Test Case

Clone https://github.com/dbaeumer/wasm-threads.git

Steps to Reproduce

Expected Results

The application doesn't hang on the fs::File::open call

I expect the same result as executing:

npm run build:os
npm run run:os

Actual Results

The application does hang.

Versions and Environment

Rust:

Default host: x86_64-unknown-linux-gnu
rustup home:  /xxxx/.rustup

installed targets for active toolchain
--------------------------------------

wasm32-unknown-unknown
wasm32-wasi
wasm32-wasi-preview1-threads
x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.77.2 (25ef9e3d8 2024-04-09)

Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)

Operating system:

Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 09:18):

dbaeumer edited issue #8392:

Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.

Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.

Test Case

Clone https://github.com/dbaeumer/wasm-threads.git

Steps to Reproduce

Expected Results

The application doesn't hang on the fs::File::open call

I expect the same result as executing:

npm run build:os
npm run run:os

Actual Results

The application does hang.

Versions and Environment

Rust:

Default host: x86_64-unknown-linux-gnu
rustup home:  /xxxx/.rustup

installed targets for active toolchain
--------------------------------------

wasm32-unknown-unknown
wasm32-wasi
wasm32-wasi-preview1-threads
x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.77.2 (25ef9e3d8 2024-04-09)

Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)

Operating system:

Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 14:47):

alexcrichton commented on issue #8392:

Using gdb/logging it looks like the main thread is blocked in memory.atomic.wait32(addr=0x104198, expected=0x80000002, timeout=None). Given that I'm not sure that the issue is in Wasmtime and this looks like a wasi-libc issue perhaps? (e.g. something perhaps related to locking in the initialization of files and such.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 14:52):

tschneidereit commented on issue #8392:

@abrown this seems relevant to your interests :)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2024 at 21:56):

abrown commented on issue #8392:

I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:

When running with WASMTIME_LOG=wasmtime_runtime::threads=trace, I see one memory.atomic.wait that is not corresponded elsewhere:

2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)

The stack trace at this instruction looks like:

__wait
__lock
internal_register_preopened_fd_unlocked
__wasilibc_populate_preopens
__wasilibc_find_abspath
__wasilibc_find_relpath_alloc
__wasilibc_find_relpath
std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0
std::sys::pal::wasi::fs::File::open::h57333908bc41373d
std::fs::OpenOptions::_open::h0ba6508e7f3dab94
std::fs::OpenOptions::open::h917f7ccf95a22000
std::fs::File::open::h2bd1932e28feed10
wasm_threads::main::hd67815b1e283a1b0
core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee
std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8
std::rt::lang_start::{{closure}}::h79de5d68ac140761
std::rt::lang_start_internal::h301b28581f6245a7
std::rt::lang_start::h8217158c4515bb59
__main_void
_start

This points to a problem in internal_register_preopened_fd_unlocked. From the file name, it would appear that no locking should happen here because its parent, __wasilibc_populate_preopens, has already acquired the lock. But, in fact, resize attempts to reacquire it despite it being held in the parent.

Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in resize since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2024 at 22:01):

abrown edited a comment on issue #8392:

I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:

When running with WASMTIME_LOG=wasmtime_runtime::threads=trace, I see one memory.atomic.wait that is not corresponded elsewhere:

2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)

The stack trace at this instruction looks like:

__wait
__lock
internal_register_preopened_fd_unlocked
__wasilibc_populate_preopens
__wasilibc_find_abspath
__wasilibc_find_relpath_alloc
__wasilibc_find_relpath
std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0
std::sys::pal::wasi::fs::File::open::h57333908bc41373d
std::fs::OpenOptions::_open::h0ba6508e7f3dab94
std::fs::OpenOptions::open::h917f7ccf95a22000
std::fs::File::open::h2bd1932e28feed10
wasm_threads::main::hd67815b1e283a1b0
core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee
std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8
std::rt::lang_start::{{closure}}::h79de5d68ac140761
std::rt::lang_start_internal::h301b28581f6245a7
std::rt::lang_start::h8217158c4515bb59
__main_void
_start

This points to a problem in internal_register_preopened_fd_unlocked. From the file name, it would appear that no locking should happen here because its parent, __wasilibc_populate_preopens, has already acquired the lock. But, in fact, resize attempts to reacquire it despite it being held in the parent.

Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in resize since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?

=====

EDIT: scratch the Git blame analysis; looking at this more, I think this was actually introduced in a refactoring in #408. @sunfishcode, it probably wasn't clear that resize retakes the lock? But it should be fine to remove it, right?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2024 at 22:02):

abrown edited a comment on issue #8392:

I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:

When running with WASMTIME_LOG=wasmtime_runtime::threads=trace, I see one memory.atomic.wait that is not corresponded elsewhere:

2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)

The stack trace at this instruction looks like:

__wait
__lock
internal_register_preopened_fd_unlocked
__wasilibc_populate_preopens
__wasilibc_find_abspath
__wasilibc_find_relpath_alloc
__wasilibc_find_relpath
std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0
std::sys::pal::wasi::fs::File::open::h57333908bc41373d
std::fs::OpenOptions::_open::h0ba6508e7f3dab94
std::fs::OpenOptions::open::h917f7ccf95a22000
std::fs::File::open::h2bd1932e28feed10
wasm_threads::main::hd67815b1e283a1b0
core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee
std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8
std::rt::lang_start::{{closure}}::h79de5d68ac140761
std::rt::lang_start_internal::h301b28581f6245a7
std::rt::lang_start::h8217158c4515bb59
__main_void
_start

This points to a problem in internal_register_preopened_fd_unlocked. From the file name, it would appear that no locking should happen here because its parent, __wasilibc_populate_preopens, has already acquired the lock. But, in fact, resize attempts to reacquire it despite it being held in the parent.

Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in resize since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?


EDIT: scratch the Git blame analysis; looking at this more, I think this was actually introduced in a refactoring in #408. @sunfishcode, it probably wasn't clear that resize retakes the lock? But it should be fine to remove it, right?

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:36):

alexcrichton commented on issue #8392:

@alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?

The "easiest" way is perhaps to:

You can probably also finagle this with various sysroot flags and Rust configuration as well and get away with only building wasi-libc if you'd like to.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 21:03):

abrown commented on issue #8392:

Ok, here's how I verified that WebAssembly/wasi-libc#491 actually fixes this issue:

Having run these steps, I'm a bit more confident that WebAssembly/wasi-libc#491 does not miss anything and, once that merges, I propose we close this.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 21:09):

abrown closed issue #8392:

Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.

Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.

Test Case

Clone https://github.com/dbaeumer/wasm-threads.git

Steps to Reproduce

Expected Results

The application doesn't hang on the fs::File::open call

I expect the same result as executing:

npm run build:os
npm run run:os

Actual Results

The application does hang.

Versions and Environment

Rust:

Default host: x86_64-unknown-linux-gnu
rustup home:  /xxxx/.rustup

installed targets for active toolchain
--------------------------------------

wasm32-unknown-unknown
wasm32-wasi
wasm32-wasi-preview1-threads
x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.77.2 (25ef9e3d8 2024-04-09)

Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)

Operating system:

Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy


Last updated: Nov 22 2024 at 16:03 UTC