Stream: git-wasmtime

Topic: wasmtime / Issue #1768 assertion failed: prev.start > max...


view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 10:06):

roman-kashitsyn opened Issue #1768:

This seems to be some race condition that happens quite rarely.

We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.

This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.

The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)

Environment: Linux x86_64, Debug build.
~~~

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 10:06):

roman-kashitsyn labeled Issue #1768:

This seems to be some race condition that happens quite rarely.

We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.

This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.

The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)

Environment: Linux x86_64, Debug build.
~~~

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 10:09):

bjorn3 commented on Issue #1768:

Do you have a backtrace or full panic message (including assertion location)?

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 13:42):

yurydelendik commented on Issue #1768:

FWIW I saw this happens if the same module is instantiated on different threads. I encountered and fixed that at https://github.com/bytecodealliance/wasmtime/pull/1761

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 14:00):

lostman commented on Issue #1768:

@yurydelendik That's a very interesting looking PR. SendableModule looks like something that can be kept around in a cache to avoid re-JIT-ing modules. Is that correct?

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 14:56):

yurydelendik commented on Issue #1768:

SendableModule looks like something that can be kept around in a cache to avoid re-JIT-ing modules. Is that correct?

Correct. Keep in mind that it is a work-in-progress though.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 16:10):

lostman commented on Issue #1768:

@bjorn3 here are the backtraces from some of the failed tests:

This one points to the lock which gets poisoned:
https://github.com/bytecodealliance/wasmtime/blob/4fdc4349109721f3a3751fa4fedc4732ea5813f8/crates/api/src/frame_info.rs#L182

[2020-05-22 14:48:51] stack backtrace:
[2020-05-22 14:48:51]    0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
[2020-05-22 14:48:51]    1: core::fmt::write
[2020-05-22 14:48:51]    2: std::io::Write::write_fmt
[2020-05-22 14:48:51]    3: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt
[2020-05-22 14:48:51]    4: std::panicking::default_hook::{{closure}}
[2020-05-22 14:48:51]    5: std::panicking::default_hook
[2020-05-22 14:48:51]    6: std::panicking::rust_panic_with_hook
[2020-05-22 14:48:51]    7: rust_begin_unwind
[2020-05-22 14:48:51]    8: core::panicking::panic_fmt
[2020-05-22 14:48:51]    9: core::result::unwrap_failed
[2020-05-22 14:48:51]   10: core::result::Result<T,E>::unwrap
[2020-05-22 14:48:51]              at /build/rustc-1.41.1-src/src/libcore/result.rs:956
[2020-05-22 14:48:51]   11: wasmtime::frame_info::register
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/frame_info.rs:182
[2020-05-22 14:48:51]   12: wasmtime::module::Module::register_frame_info
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/module.rs:494
[2020-05-22 14:48:51]   13: wasmtime::instance::Instance::new
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/instance.rs:139

And this one to the assertion:
https://github.com/bytecodealliance/wasmtime/blob/4fdc4349109721f3a3751fa4fedc4732ea5813f8/crates/api/src/frame_info.rs#L186

[2020-05-22 14:48:51] stack backtrace:
[2020-05-22 14:48:51]    0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
[2020-05-22 14:48:51]    1: core::fmt::write
[2020-05-22 14:48:51]    2: std::io::Write::write_fmt
[2020-05-22 14:48:51]    3: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt
[2020-05-22 14:48:51]    4: std::panicking::default_hook::{{closure}}
[2020-05-22 14:48:51]    5: std::panicking::default_hook
[2020-05-22 14:48:51]    6: std::panicking::rust_panic_with_hook
[2020-05-22 14:48:51]    7: std::panicking::begin_panic
[2020-05-22 14:48:51]              at /build/rustc-1.41.1-src/src/libstd/panicking.rs:404
[2020-05-22 14:48:51]    8: wasmtime::frame_info::register
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/frame_info.rs:186
[2020-05-22 14:48:51]    9: wasmtime::module::Module::register_frame_info
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/module.rs:494
[2020-05-22 14:48:51]   10: wasmtime::instance::Instance::new
[2020-05-22 14:48:51]              at /nix/store/hc8ml9nz5f0wpl8x0qf58z3ag5narmnb-crates-io/wasmtime-wasmtime-profiling-wasmtime-cli-wasmtime-fuzz-4fdc4349109721f3a3751fa4fedc4732ea5813f8/src/instance.rs:139

the same module is instantiated on different threads

@yurydelendik How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 19:08):

yurydelendik commented on Issue #1768:

How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.

My guess, if GlobalFrameInfoRegistration is not dropped, e.g. Module (is not freed) but Store released its resources, and new compilation allocated CodeMemory in the same region -- you may see similar failure.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 19:09):

yurydelendik edited a comment on Issue #1768:

How can this happen though? We don't move anything between threads. Each test does everything from scratch: create Store, Engine, Instance, Module, etc. Nothing is shared.

My guess, if GlobalFrameInfoRegistration is not dropped (e.g. Module is not freed) but Store released its resources, and new compilation allocated CodeMemory in the same region -- you may see similar failure.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2020 at 15:44):

alexcrichton commented on Issue #1768:

I tried to reproduce this locally with some tests but wasn't able to get too too far. I think this diff may fix the issue since I think we're dropping fields in the wrong order by accident, but I'm not sure. @lostman if you're unable to reduce this to share a test case, can you test out that patch and see if it fixes the issue for you?

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2020 at 15:14):

lostman commented on Issue #1768:

@alexcrichton AFAIK this has only appeared twice when running our CI test suite. We've been on this version of Wasmtime since May 1st (and will update shortly).

We have hundreds of tests and this error isn't specific to any single one of them. There were many backtraces like the above in the log for that build.

Also, PR that triggered this error was only few lines long and had nothing to do with Wasmtime.

As such I don't think I can provide any further insight into what happened.

I hope the diff you linked fixes it!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 17:07):

alexcrichton commented on Issue #1768:

Ok no worries! In that case I think it's best to defer this fix to #1761 which should land soon anyway

view this post on Zulip Wasmtime GitHub notifications bot (Jun 09 2020 at 15:16):

yurydelendik commented on Issue #1768:

@roman-kashitsyn was the issue resolved by #1761 ?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 09 2020 at 16:19):

lostman commented on Issue #1768:

@yurydelendik @roman-kashitsyn and I work together. I’m just finishing updating our code to latest Wasmtime and will be testing this tomorrow

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 07:42):

lostman commented on Issue #1768:

I think this could be closed now. I've updated to latest Wasmtime and we haven't experienced this error again. If it ever comes back we can reopen the issue.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 13:07):

yurydelendik closed Issue #1768:

This seems to be some race condition that happens quite rarely.

We have a large test suite that creates many instances of Wasm modules in parallel, executes one function and destroys the instances. The instances don't share any state, each time we create a new Store, a new Module and then an Instance, call the function, and destroy everything.
The code of the test suite is closed for now.

This issue was only seen twice, so even if the test suite was openly available, it wouldn't be of much help.

The version of Wasmtime we use is
```
wasmtime 0.15.0 (git+https://github.com/bytecodealliance/wasmtime?rev=4fdc4349109721f3a3751fa4fedc4732ea5813f8)

Environment: Linux x86_64, Debug build.
~~~

Last updated: Nov 22 2024 at 16:03 UTC