roman-kashitsyn opened Issue #1768:
This seems to be some race condition that happens quite rarely.
We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.
The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)Environment: Linux x86_64, Debug build. ~~~
roman-kashitsyn labeled Issue #1768:
This seems to be some race condition that happens quite rarely.
We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.
The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)Environment: Linux x86_64, Debug build. ~~~
bjorn3 commented on Issue #1768:
Do you have a backtrace or full panic message (including assertion location)?
yurydelendik commented on Issue #1768:
FWIW I saw this happens if the same module is instantiated on different threads. I encountered and fixed that at https://github.com/bytecodealliance/wasmtime/pull/1761
lostman commented on Issue #1768:
@yurydelendik That's a very interesting looking PR.
SendableModule
looks like something that can be kept around in a cache to avoid re-JIT-ing modules. Is that correct?
yurydelendik commented on Issue #1768:
SendableModule looks like something that can be kept around in a cache to avoid re-JIT-ing modules. Is that correct?
Correct. Keep in mind that it is a work-in-progress though.
lostman commented on Issue #1768:
@bjorn3 here are the backtraces from some of the failed tests:
This one points to the lock which gets poisoned:
https://github.com/bytecodealliance/wasmtime/blob/4fdc4349109721f3a3751fa4fedc4732ea5813f8/crates/api/src/frame_info.rs#L182[2020-05-22 14:48:51] stack backtrace: [2020-05-22 14:48:51] 0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt [2020-05-22 14:48:51] 1: core::fmt::write [2020-05-22 14:48:51] 2: std::io::Write::write_fmt [2020-05-22 14:48:51] 3: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt [2020-05-22 14:48:51] 4: std::panicking::default_hook::{{closure}} [2020-05-22 14:48:51] 5: std::panicking::default_hook [2020-05-22 14:48:51] 6: std::panicking::rust_panic_with_hook [2020-05-22 14:48:51] 7: rust_begin_unwind [2020-05-22 14:48:51] 8: core::panicking::panic_fmt [2020-05-22 14:48:51] 9: core::result::unwrap_failed [2020-05-22 14:48:51] 10: core::result::Result<T,E>::unwrap [2020-05-22 14:48:51] at /build/rustc-1.41.1-src/src/libcore/result.rs:956 [2020-05-22 14:48:51] 11: wasmtime::frame_info::register [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/frame_info.rs:182 [2020-05-22 14:48:51] 12: wasmtime::module::Module::register_frame_info [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/module.rs:494 [2020-05-22 14:48:51] 13: wasmtime::instance::Instance::new [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/instance.rs:139
And this one to the assertion:
https://github.com/bytecodealliance/wasmtime/blob/4fdc4349109721f3a3751fa4fedc4732ea5813f8/crates/api/src/frame_info.rs#L186[2020-05-22 14:48:51] stack backtrace: [2020-05-22 14:48:51] 0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt [2020-05-22 14:48:51] 1: core::fmt::write [2020-05-22 14:48:51] 2: std::io::Write::write_fmt [2020-05-22 14:48:51] 3: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt [2020-05-22 14:48:51] 4: std::panicking::default_hook::{{closure}} [2020-05-22 14:48:51] 5: std::panicking::default_hook [2020-05-22 14:48:51] 6: std::panicking::rust_panic_with_hook [2020-05-22 14:48:51] 7: std::panicking::begin_panic [2020-05-22 14:48:51] at /build/rustc-1.41.1-src/src/libstd/panicking.rs:404 [2020-05-22 14:48:51] 8: wasmtime::frame_info::register [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/frame_info.rs:186 [2020-05-22 14:48:51] 9: wasmtime::module::Module::register_frame_info [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/module.rs:494 [2020-05-22 14:48:51] 10: wasmtime::instance::Instance::new [2020-05-22 14:48:51] at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/instance.rs:139
the same module is instantiated on different threads
@yurydelendik How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.
yurydelendik commented on Issue #1768:
How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.
My guess, if
GlobalFrameInfoRegistration
is not dropped, e.g.Module
(is not freed) butStore
released its resources, and new compilation allocatedCodeMemory
in the same region -- you may see similar failure.
yurydelendik edited a comment on Issue #1768:
How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.
My guess, if
GlobalFrameInfoRegistration
is not dropped (e.g.Module
is not freed) butStore
released its resources, and new compilation allocatedCodeMemory
in the same region -- you may see similar failure.
alexcrichton commented on Issue #1768:
I tried to reproduce this locally with some tests but wasn't able to get too too far. I think this diff may fix the issue since I think we're dropping fields in the wrong order by accident, but I'm not sure. @lostman if you're unable to reduce this to share a test case, can you test out that patch and see if it fixes the issue for you?
lostman commented on Issue #1768:
@alexcrichton AFAIK this has only appeared twice when running our CI test suite. We've been on this version of Wasmtime since May 1st (and will update shortly).
We have hundreds of tests and this error isn't specific to any single one of them. There were many backtraces like the above in the log for that build.
Also, PR that triggered this error was only few lines long and had nothing to do with Wasmtime.
As such I don't think I can provide any further insight into what happened.
I hope the diff you linked fixes it!
alexcrichton commented on Issue #1768:
Ok no worries! In that case I think it's best to defer this fix to #1761 which should land soon anyway
yurydelendik commented on Issue #1768:
@roman-kashitsyn was the issue resolved by #1761 ?
lostman commented on Issue #1768:
@yurydelendik @roman-kashitsyn and I work together. I’m just finishing updating our code to latest Wasmtime and will be testing this tomorrow
lostman commented on Issue #1768:
I think this could be closed now. I've updated to latest Wasmtime and we haven't experienced this error again. If it ever comes back we can reopen the issue.
yurydelendik closed Issue #1768:
This seems to be some race condition that happens quite rarely.
We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.
The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)Environment: Linux x86_64, Debug build. ~~~
Last updated: Dec 23 2024 at 12:05 UTC