wasmtime / issue #12511 SIGABRT crashes when creating man... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #12511 SIGABRT crashes when creating man...

Wasmtime GitHub notifications bot (Feb 04 2026 at 00:50):

zacharywhitley opened issue #12511:

SIGABRT crashes when creating many engines/modules due to GLOBAL_CODE registry address reuse

Summary

When creating and destroying many wasmtime::Engine and wasmtime::Module instances in a single process, the GLOBAL_CODE registry's assert!() statements cause SIGABRT crashes after approximately 350-400 iterations. This is caused by virtual address reuse before Arc references are fully released.

Environment

Wasmtime version: 41.0.1 (also confirmed in recent main)

Platform: macOS Darwin 24.5.0, Linux

Configuration: signals_based_traps(false) (required for JVM integration)

Reproduction
use wasmtime::{Config, Engine, Module};

fn main() {
    let wat = "(module (func (export \"test\") (result i32) i32.const 42))";

    for i in 0..500 {
        let mut config = Config::new();
        config.signals_based_traps(false);
        let engine = Engine::new(&config).unwrap();
        let _module = Module::new(&engine, wat).unwrap();
        // Engine and module dropped here

        if i % 100 == 0 {
            eprintln!("Iteration {}", i);
        }
    }
    println!("Completed all iterations");
}
Expected: All 500 iterations complete successfully.
Actual: Process aborts with SIGABRT around iteration 350-400.

Root Cause Analysis

The Problem

In crates/wasmtime/src/runtime/module/registry.rs:
pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
    // ...
    let prev = global_code().write().insert(end, (start, image.clone()));
    assert!(prev.is_none());  // ABORTS if duplicate key
}

pub fn unregister_code(address: Range<usize>) {
    // ...
    let code = global_code().write().remove(&end);
    assert!(code.is_some());  // ABORTS if key not found
}
Why This Happens

Engine A allocates code memory at virtual address range [0x1000, 0x2000)

register_code registers this with key 0x1FFF (end - 1)

Engine A is "dropped" but Arc<CodeMemory> references may still exist (from Module, Store, etc.)

Engine B is created; the OS reuses virtual address [0x1000, 0x2000) for its code

register_code tries to insert key 0x1FFF again

assert!(prev.is_none()) fails → SIGABRT

The reverse can also happen:

Old Arc<CodeMemory> finally deallocates, calling unregister_code

But the new engine already re-registered at that address

unregister_code removes the new engine's registration

Later, new engine's drop calls unregister_code

assert!(code.is_some()) fails → SIGABRT

Why ~350-400 Iterations?

This threshold corresponds to when virtual address reuse becomes statistically likely given:

macOS/Linux mmap allocation patterns

Accumulated Arc references not yet fully released by Rust's deferred deallocation

Proposed Fix

Make register_code and unregister_code idempotent by tracking registered addresses:
fn registered_addresses() -> &'static RwLock<BTreeSet<usize>> {
    static REGISTERED: OnceLock<RwLock<BTreeSet<usize>>> = OnceLock::new();
    REGISTERED.get_or_init(Default::default)
}

pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
    if address.is_empty() {
        return;
    }
    let start = address.start;
    let end = address.end - 1;

    // Check if already registered - make operation idempotent
    {
        let mut tracked = registered_addresses().write();
        if tracked.contains(&end) {
            return; // Already registered, skip
        }
        tracked.insert(end);
    }

    global_code().write().insert(end, (start, image.clone()));
}

pub fn unregister_code(address: Range<usize>) {
    if address.is_empty() {
        return;
    }
    let end = address.end - 1;

    // Check if registered - make operation idempotent
    {
        let mut tracked = registered_addresses().write();
        if !tracked.contains(&end) {
            return; // Not registered, skip
        }
        tracked.remove(&end);
    }

    global_code().write().remove(&end);
}
Why This Fix is Safe

No functionality change: The registry still correctly tracks all live code regions

Minimal overhead: BTreeSet lookup is O(log n), same as the existing BTreeMap

Thread-safe: Uses the same RwLock pattern as the existing code

Backward compatible: No API changes

Impact

This issue affects:

JVM integrations (Java, Kotlin, Scala) where signals_based_traps(false) is required

Long-running servers that dynamically load/unload WASM modules

Test suites with many engine/module creation tests

Any application creating 350+ engines/modules in a single process

Workarounds

Engine reuse: Share a singleton engine across the application (mitigates but doesn't eliminate)

Process isolation: Run tests in separate processes (inconvenient for CI)

Neither workaround fully solves the issue for libraries that can't control caller behavior.

Related

Wasmtime's trap handling architecture in docs/contributing-architecture.md

ModuleRegistry and GLOBAL_MODULES for similar patterns

Additional Context

We discovered this issue while building Java bindings for wasmtime (wasmtime4j). Our test suite has ~860 tests, many creating engines and modules. The suite consistently crashed around test 350-400 before we implemented this fix.

We've been running with this fix in production and all tests now pass reliably.

Wasmtime GitHub notifications bot (Feb 04 2026 at 01:38):

alexcrichton commented on issue #12511:

Thanks for the report, but I'd like to ask you to set aside the LLM on this. This is a lot of text to wade through, much of it I think is inaccurate, and it detracts from the problem you're having.

Can you share your reproduction steps for this issue? I ran the program on macOS and Linux in debug and release mode and never got a crash. I ran ~5.5M iterations on Linux and also never saw a crash. Are you able to reproduce with the program listed here? If not can you share a stack trace or similar?

This might be a problem with bindings to wasmtime from the JVM, such as a use-after-free in the C API, or something like that. If you're able to share more details about that it might help narrow it down.

Wasmtime GitHub notifications bot (Feb 04 2026 at 15:23):

zacharywhitley commented on issue #12511:

Sorry about the wall of text. I'm running the rust api on osx. I'll fire up my Linux machine when I get home and try it here and see if I can get some more information.

I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.

Wasmtime GitHub notifications bot (Feb 04 2026 at 15:58):

zacharywhitley edited a comment on issue #12511:

Sorry about the wall of text. I'm running the rust api on osx. I can't get to this until I get home this evening but when I do I'll get you some more information.

I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.

Wasmtime GitHub notifications bot (Feb 04 2026 at 15:58):

zacharywhitley edited a comment on issue #12511:

I can't get to this until I get home this evening but when I do I'll get you some more information.

I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore.

Wasmtime GitHub notifications bot (Feb 04 2026 at 15:59):

zacharywhitley edited a comment on issue #12511:

I can't get to this until I get home this evening but when I do I'll get you some more information.

I'm working on wasmtime JVM bindings. I have both JNI and Panama interfaces and it's coming along fairly well. I while ago I went to use https://github.com/kawamuray/wasmtime-java and https://github.com/bluejekyll/wasmtime-java and they worked as a proof of concept but they are very minimal implementations for https://github.com/semantalytics/stardog-webfunction-plugin , They haven't been maintained in years and I always thought that they sold wasmtime short. I wanted to do something more than just a proof of concept and had a bunch of other ideas that I wanted to explore so I started working on this project.

Wasmtime GitHub notifications bot (Feb 05 2026 at 02:41):

zacharywhitley commented on issue #12511:

I’m a bit burned out on this and may be missing some nuance in the mechanistic explanation. The concrete, reproducible issue is that Wasmtime can SIGABRT via assert!(prev.is_none()) / assert!(code.is_some()) in GLOBAL_CODE under repeated create/drop churn. I've added tests to my branch. The problem does not manifest with Rust but will for any FFI.

I have a patch that prevents the host abort by making (un)registration tolerant to collisions / missing entries. I’m happy to revise the explanation and/or adjust the fix to match the intended invariants — the main goal is ensuring a production embedder cannot be taken down by internal asserts.

Wasmtime GitHub notifications bot (Feb 05 2026 at 12:10):

tschneidereit commented on issue #12511:

@zacharywhitley I appreciate that you're trying what you can to address an issue you're running into, and we want to help as much as we can. However, precisely because Wasmtime is the kind of project where bugs can quickly turn into severe security or availability issues, we definitely can't land PRs that maintainers don't think are correct, and that no human (and in particular none of the project maintainers) fully understands. Otherwise we run the risk of the change causing unintended effects elsewhere.

The problem does not manifest with Rust but will for any FFI.

If you could provide a test case here that is reduced, ideally using a language like C or C++ that directly uses the FFI, and directly reproduces the issue, that would help greatly in understanding the underlying issue and fully addressing it. An LLM generated analysis of the cause isn't a replacement for that step, and puts a lot of burden on maintainers, as I'm sure you appreciate.

Wasmtime GitHub notifications bot (Feb 05 2026 at 12:38):

zacharywhitley commented on issue #12511:

Totally appreciate that. I just wanted to make sure that I responded when I said that I would and I would be more than happy to assist as much as I can so that the wasmtime devs can devote as much time to wasmtime as possible. I'm a huge fan and have been using (and abusing) it quite a bit lately and having a great time. I just might take a little bit to work this up because this is a nights and weekends thing....for now. :)

Wasmtime GitHub notifications bot (Feb 05 2026 at 15:46):

alexcrichton commented on issue #12511:

One thing I might recommend is taking a close look at memory management in the FFI layer. A double-free (e.g. calling wasmtime_*_free twice on the asme object) could likely cause an issue like this. I'm not aware of any way to trigger these asserts from typical API usage, but invalid API usage, which would result in UB, could trigger something like this.

Wasmtime GitHub notifications bot (Feb 05 2026 at 16:40):

fitzgen commented on issue #12511:

One thing I might recommend is taking a close look at memory management in the FFI layer. A double-free (e.g. calling wasmtime_*_free twice on the asme object) could likely cause an issue like this. I'm not aware of any way to trigger these asserts from typical API usage, but invalid API usage, which would result in UB, could trigger something like this.

And running your FFI embedding with ASan can help surface these kinds of bugs in situations where they might otherwise silently go unnoticed in the moment and only cause problems much later down the line.

FWIW, here are the cmake flags for building Wasmtime's C API with ASan:

https://github.com/bytecodealliance/wasmtime/blob/72520ee5c2be0a01ea779d5fcde16d393973ba14/.github/workflows/main.yml#L691-L694

Wasmtime GitHub notifications bot (Feb 06 2026 at 16:50):

zacharywhitley commented on issue #12511:

Thanks for the suggestions, I’ve reduced this to a pure C API repro (no JVM, no GC) and updated the branch here:

https://github.com/tegmentum/wasmtime/tree/codex/create-minimal-rust-test-for-global_code-error

Build + run (Linux):
cmake -S crates/c-api -B target/c-api-build-offline -DBUILD_TESTS=ON -DWASMTIME_CAPI_ENABLE_GTEST_TESTS=OFF
cmake --build target/c-api-build-offline --target test-global-code -j4
ctest --test-dir target/c-api-build-offline -R test-global-code --output-on-failure
The test is contract-clean (each wasmtime_*_delete called exactly once; no use-after-delete; threads joined before teardown).

On Linux this fails with a stable condition (expected 0 post-drop mapped addresses, got N) and under lldb consistently breaks in the threaded post-drop path.

The /proc/self/maps check is included as extra evidence only. The core issue is that under repeated create/drop churn the GLOBAL_CODE invariants can be violated, leading to internal assert,

Wasmtime GitHub notifications bot (Feb 06 2026 at 17:28):

alexcrichton commented on issue #12511:

Thanks! The test_unregisters_under_threaded_pressure is the culprit in that test and it's been helpful to read over and poke at that locally. Unfortunately though I'm not sure this is enough of a smoking gun to know what to do in Wasmtime. This is using /proc/self/maps to show that something is still mapped after a module/engine are destroyed, but that's "action at a distance" of a sort where it's showing that Wasmtime might have problems but the problem still isn't within Wasmtime. My guess, without knowing specifics, is that /proc/self/maps is showing something stale rather than something that's up-to-date. My rough guess for that is that it has something to do with the fact that other threads in the same process also have /proc/self/maps open and there might be some sort of caching/queueing/etc at the kernel level. In Wasmtime though, as far as I can tell, we're correctly purging all mappings related to all modules.

If what I'm suspecting is the case, however, then it would also mean that this shouldn't be a problem in Wasmtime. The kernel is responsible for ensuring that it only hands out unmapped mappings when we ask for a new mapping, so nothing should be overlapping accidentally or anything like that.

Are you able to write up a test which aborts/panics within Wasmtime itself?

Wasmtime GitHub notifications bot (Feb 06 2026 at 17:50):

zacharywhitley commented on issue #12511:

Thanks for taking a look. That makes sense, and I agree that /proc/self/maps is indirect evidence and not something Wasmtime should treat as a contract. I included it only as supporting information while trying to localize the behavior under pressure.

The original symptom that led me here was Wasmtime aborting internally via the GLOBAL_CODE assertions (assert!(prev.is_none()) / assert!(code.is_some())) under repeated create/drop churn. The /proc/self/maps checks were added later to help understand timing and reuse, not as the primary failure condition.

I can work on reducing this further to a test that triggers a panic/abort within Wasmtime itself (via the registry invariants), without relying on /proc/self/maps at all. I’ll follow up once I have that.

Appreciate you digging into this, the feedback is helpful for narrowing what’s actually actionable on the Wasmtime side.

Last updated: Feb 24 2026 at 04:36 UTC