zacharywhitley opened issue #12511:
SIGABRT crashes when creating many engines/modules due to GLOBAL_CODE registry address reuse
Summary
When creating and destroying many
wasmtime::Engineandwasmtime::Moduleinstances in a single process, theGLOBAL_CODEregistry'sassert!()statements cause SIGABRT crashes after approximately 350-400 iterations. This is caused by virtual address reuse before Arc references are fully released.Environment
- Wasmtime version: 41.0.1 (also confirmed in recent main)
- Platform: macOS Darwin 24.5.0, Linux
- Configuration:
signals_based_traps(false)(required for JVM integration)Reproduction
use wasmtime::{Config, Engine, Module}; fn main() { let wat = "(module (func (export \"test\") (result i32) i32.const 42))"; for i in 0..500 { let mut config = Config::new(); config.signals_based_traps(false); let engine = Engine::new(&config).unwrap(); let _module = Module::new(&engine, wat).unwrap(); // Engine and module dropped here if i % 100 == 0 { eprintln!("Iteration {}", i); } } println!("Completed all iterations"); }Expected: All 500 iterations complete successfully.
Actual: Process aborts with SIGABRT around iteration 350-400.Root Cause Analysis
The Problem
In
crates/wasmtime/src/runtime/module/registry.rs:pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) { // ... let prev = global_code().write().insert(end, (start, image.clone())); assert!(prev.is_none()); // ABORTS if duplicate key } pub fn unregister_code(address: Range<usize>) { // ... let code = global_code().write().remove(&end); assert!(code.is_some()); // ABORTS if key not found }Why This Happens
- Engine A allocates code memory at virtual address range
[0x1000, 0x2000)register_coderegisters this with key0x1FFF(end - 1)- Engine A is "dropped" but
Arc<CodeMemory>references may still exist (from Module, Store, etc.)- Engine B is created; the OS reuses virtual address
[0x1000, 0x2000)for its coderegister_codetries to insert key0x1FFFagainassert!(prev.is_none())fails → SIGABRTThe reverse can also happen:
- Old
Arc<CodeMemory>finally deallocates, callingunregister_code- But the new engine already re-registered at that address
unregister_coderemoves the new engine's registration- Later, new engine's drop calls
unregister_codeassert!(code.is_some())fails → SIGABRTWhy ~350-400 Iterations?
This threshold corresponds to when virtual address reuse becomes statistically likely given:
- macOS/Linux mmap allocation patterns
- Accumulated Arc references not yet fully released by Rust's deferred deallocation
Proposed Fix
Make
register_codeandunregister_codeidempotent by tracking registered addresses:fn registered_addresses() -> &'static RwLock<BTreeSet<usize>> { static REGISTERED: OnceLock<RwLock<BTreeSet<usize>>> = OnceLock::new(); REGISTERED.get_or_init(Default::default) } pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) { if address.is_empty() { return; } let start = address.start; let end = address.end - 1; // Check if already registered - make operation idempotent { let mut tracked = registered_addresses().write(); if tracked.contains(&end) { return; // Already registered, skip } tracked.insert(end); } global_code().write().insert(end, (start, image.clone())); } pub fn unregister_code(address: Range<usize>) { if address.is_empty() { return; } let end = address.end - 1; // Check if registered - make operation idempotent { let mut tracked = registered_addresses().write(); if !tracked.contains(&end) { return; // Not registered, skip } tracked.remove(&end); } global_code().write().remove(&end); }Why This Fix is Safe
- No functionality change: The registry still correctly tracks all live code regions
- Minimal overhead: BTreeSet lookup is O(log n), same as the existing BTreeMap
- Thread-safe: Uses the same RwLock pattern as the existing code
- Backward compatible: No API changes
Impact
This issue affects:
- JVM integrations (Java, Kotlin, Scala) where
signals_based_traps(false)is required- Long-running servers that dynamically load/unload WASM modules
- Test suites with many engine/module creation tests
- Any application creating 350+ engines/modules in a single process
Workarounds
- Engine reuse: Share a singleton engine across the application (mitigates but doesn't eliminate)
- Process isolation: Run tests in separate processes (inconvenient for CI)
Neither workaround fully solves the issue for libraries that can't control caller behavior.
Related
- Wasmtime's trap handling architecture in
docs/contributing-architecture.mdModuleRegistryandGLOBAL_MODULESfor similar patternsAdditional Context
We discovered this issue while building Java bindings for wasmtime (wasmtime4j). Our test suite has ~860 tests, many creating engines and modules. The suite consistently crashed around test 350-400 before we implemented this fix.
We've been running with this fix in production and all tests now pass reliably.
alexcrichton commented on issue #12511:
Thanks for the report, but I'd like to ask you to set aside the LLM on this. This is a lot of text to wade through, much of it I think is inaccurate, and it detracts from the problem you're having.
Can you share your reproduction steps for this issue? I ran the program on macOS and Linux in debug and release mode and never got a crash. I ran ~5.5M iterations on Linux and also never saw a crash. Are you able to reproduce with the program listed here? If not can you share a stack trace or similar?
This might be a problem with bindings to wasmtime from the JVM, such as a use-after-free in the C API, or something like that. If you're able to share more details about that it might help narrow it down.
zacharywhitley commented on issue #12511:
Sorry about the wall of text. I'm running the rust api on osx. I'll fire up my Linux machine when I get home and try it here and see if I can get some more information.
I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.
zacharywhitley edited a comment on issue #12511:
Sorry about the wall of text. I'm running the rust api on osx. I can't get to this until I get home this evening but when I do I'll get you some more information.
I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.
zacharywhitley edited a comment on issue #12511:
I can't get to this until I get home this evening but when I do I'll get you some more information.
I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.
zacharywhitley edited a comment on issue #12511:
I can't get to this until I get home this evening but when I do I'll get you some more information.
I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore so I started working on this project.
zacharywhitley commented on issue #12511:
I’m a bit burned out on this and may be missing some nuance in the mechanistic explanation. The concrete, reproducible issue is that Wasmtime can SIGABRT via assert!(prev.is_none()) / assert!(code.is_some()) in GLOBAL_CODE under repeated create/drop churn. I've added tests to my branch. The problem does not manifest with Rust but will for any FFI.
I have a patch that prevents the host abort by making (un)registration tolerant to collisions / missing entries. I’m happy to revise the explanation and/or adjust the fix to match the intended invariants — the main goal is ensuring a production embedder cannot be taken down by internal asserts.
tschneidereit commented on issue #12511:
@zacharywhitley I appreciate that you're trying what you can to address an issue you're running into, and we want to help as much as we can. However, precisely because Wasmtime is the kind of project where bugs can quickly turn into severe security or availability issues, we definitely can't land PRs that maintainers don't think are correct, and that no human (and in particular none of the project maintainers) fully understands. Otherwise we run the risk of the change causing unintended effects elsewhere.
The problem does not manifest with Rust but will for any FFI.
If you could provide a test case here that is reduced, ideally using a language like C or C++ that directly uses the FFI, and directly reproduces the issue, that would help greatly in understanding the underlying issue and fully addressing it. An LLM generated analysis of the cause isn't a replacement for that step, and puts a lot of burden on maintainers, as I'm sure you appreciate.
zacharywhitley commented on issue #12511:
Totally appreciate that. I just wanted to make sure that I responded when I said that I would and I would be more than happy to assist as much as I can so that the wasmtime devs can devote as much time to wasmtime as possible. I'm a huge fan and have been using (and abusing) it quite a bit lately and having a great time. I just might take a little bit to work this up because this is a nights and weekends thing....for now. :)
alexcrichton commented on issue #12511:
One thing I might recommend is taking a close look at memory management in the FFI layer. A double-free (e.g. calling
wasmtime_*_freetwice on the asme object) could likely cause an issue like this. I'm not aware of any way to trigger these asserts from typical API usage, but invalid API usage, which would result in UB, could trigger something like this.
fitzgen commented on issue #12511:
One thing I might recommend is taking a close look at memory management in the FFI layer. A double-free (e.g. calling
wasmtime_*_freetwice on the asme object) could likely cause an issue like this. I'm not aware of any way to trigger these asserts from typical API usage, but invalid API usage, which would result in UB, could trigger something like this.And running your FFI embedding with ASan can help surface these kinds of bugs in situations where they might otherwise silently go unnoticed in the moment and only cause problems much later down the line.
FWIW, here are the cmake flags for building Wasmtime's C API with ASan:
zacharywhitley commented on issue #12511:
Thanks for the suggestions, I’ve reduced this to a pure C API repro (no JVM, no GC) and updated the branch here:
https://github.com/tegmentum/wasmtime/tree/codex/create-minimal-rust-test-for-global_code-error
Build + run (Linux):
cmake -S crates/c-api -B target/c-api-build-offline -DBUILD_TESTS=ON -DWASMTIME_CAPI_ENABLE_GTEST_TESTS=OFF cmake --build target/c-api-build-offline --target test-global-code -j4 ctest --test-dir target/c-api-build-offline -R test-global-code --output-on-failureThe test is contract-clean (each wasmtime_*_delete called exactly once; no use-after-delete; threads joined before teardown).
On Linux this fails with a stable condition (expected 0 post-drop mapped addresses, got N) and under lldb consistently breaks in the threaded post-drop path.
The /proc/self/maps check is included as extra evidence only. The core issue is that under repeated create/drop churn the GLOBAL_CODE invariants can be violated, leading to internal assert,
alexcrichton commented on issue #12511:
Thanks! The
test_unregisters_under_threaded_pressureis the culprit in that test and it's been helpful to read over and poke at that locally. Unfortunately though I'm not sure this is enough of a smoking gun to know what to do in Wasmtime. This is using/proc/self/mapsto show that something is still mapped after a module/engine are destroyed, but that's "action at a distance" of a sort where it's showing that Wasmtime might have problems but the problem still isn't within Wasmtime. My guess, without knowing specifics, is that/proc/self/mapsis showing something stale rather than something that's up-to-date. My rough guess for that is that it has something to do with the fact that other threads in the same process also have/proc/self/mapsopen and there might be some sort of caching/queueing/etc at the kernel level. In Wasmtime though, as far as I can tell, we're correctly purging all mappings related to all modules.If what I'm suspecting is the case, however, then it would also mean that this shouldn't be a problem in Wasmtime. The kernel is responsible for ensuring that it only hands out unmapped mappings when we ask for a new mapping, so nothing should be overlapping accidentally or anything like that.
Are you able to write up a test which aborts/panics within Wasmtime itself?
zacharywhitley commented on issue #12511:
Thanks for taking a look. That makes sense, and I agree that /proc/self/maps is indirect evidence and not something Wasmtime should treat as a contract. I included it only as supporting information while trying to localize the behavior under pressure.
The original symptom that led me here was Wasmtime aborting internally via the GLOBAL_CODE assertions (assert!(prev.is_none()) / assert!(code.is_some())) under repeated create/drop churn. The /proc/self/maps checks were added later to help understand timing and reuse, not as the primary failure condition.
I can work on reducing this further to a test that triggers a panic/abort within Wasmtime itself (via the registry invariants), without relying on /proc/self/maps at all. I’ll follow up once I have that.
Appreciate you digging into this, the feedback is helpful for narrowing what’s actually actionable on the Wasmtime side.
Last updated: Feb 24 2026 at 04:36 UTC