Stream: git-wasmtime

Topic: wasmtime / issue #12076 Cranelift: Non-deterministic JIT ...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:12):

darmie opened issue #12076:

Summary

JIT-compiled code execution on ARM64 macOS (Apple Silicon) fails non-deterministically, with approximately 44% failure rate in multi-threaded scenarios. The failures manifest as SIGBUS, incorrect results, or silent corruption.

Environment

Observed Behavior

In a real-world JIT compiler (Rayzor - a Haxe-to-native compiler using cranelift-jit), we observe:

The failures manifest as:

Note on Minimal Reproduction

A simple test case may not reliably reproduce the issue because:

  1. The failure is non-deterministic and depends on timing, memory layout, and CPU scheduling
  2. Simple tests may not trigger the problematic code paths
  3. The issue is more likely with:

    • Complex JIT-compiled functions with multiple blocks
    • Multiple functions compiled together
    • Closures and captured variables passed to threads
    • Runtime library functions called from JIT code

Complex Reproduction (Rayzor Compiler)

The issue was observed and fixed in the Rayzor compiler, a Haxe-to-native compiler using cranelift-jit. The compiler:

E2E Test Case: compiler/examples/test_rayzor_stdlib_e2e.rs

Commits for testing:

# Test BEFORE fix (~56% success rate)
git clone https://github.com/darmie/rayzor
cd rayzor
git checkout 0eb9472
cargo build --release --package compiler --example test_rayzor_stdlib_e2e

# Run stability test
passed=0; failed=0
for i in {1..50}; do
    if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done
echo "Before fix - Passed: $passed/50, Failed: $failed/50"

# Test AFTER fix (100% success rate)
git checkout 9a0e80e
cargo build --release --package compiler --example test_rayzor_stdlib_e2e

passed=0; failed=0
for i in {1..50}; do
    if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done
echo "After fix - Passed: $passed/50, Failed: $failed/50"

Results:

Commit Configuration Success Rate
0eb9472 Upstream cranelift (no MAP_JIT) ~56% (28/50)
9a0e80e darmie/wasmtime fix-plt-aarch64 100% (50/50)

Simple Test Case (May Not Reliably Fail)

For reference, here's a minimal test that exercises the same code paths:

use cranelift::prelude::*;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{Linkage, Module, FuncId};
use std::thread;

fn define_function(module: &mut JITModule, name: &str, op: &str) -> FuncId {
    let mut sig = module.make_signature();
    sig.params.push(AbiParam::new(types::I64));
    sig.params.push(AbiParam::new(types::I64));
    sig.returns.push(AbiParam::new(types::I64));

    let func_id = module.declare_function(name, Linkage::Export, &sig).unwrap();

    let mut ctx = module.make_context();
    ctx.func.signature = sig;

    let mut builder_ctx = FunctionBuilderContext::new();
    {
        let mut builder = FunctionBuilder::new(&mut ctx.func, &mut builder_ctx);
        let block = builder.create_block();
        builder.append_block_params_for_function_params(block);
        builder.switch_to_block(block);
        builder.seal_block(block);

        let a = builder.block_params(block)[0];
        let b = builder.block_params(block)[1];

        let result = match op {
            "add" => builder.ins().iadd(a, b),
            "sub" => builder.ins().isub(a, b),
            "mul" => builder.ins().imul(a, b),
            _ => builder.ins().iadd(a, b),
        };

        builder.ins().return_(&[result]);
        builder.finalize();
    }

    module.define_function(func_id, &mut ctx).unwrap();
    module.clear_context(&mut ctx);

    func_id
}

fn main() {
    let mut flag_builder = settings::builder();
    flag_builder.set("use_colocated_libcalls", "false").unwrap();
    flag_builder.set("is_pic", "false").unwrap();
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder.finish(settings::Flags::new(flag_builder)).unwrap();

    let builder = JITBuilder::with_isa(isa, cranelift_module::default_libcall_names());
    let mut module = JITModule::new(builder);

    let add_id = define_function(&mut module, "add", "add");
    let sub_id = define_function(&mut module, "sub", "sub");
    let mul_id = define_function(&mut module, "mul", "mul");

    module.finalize_definitions().unwrap();

    let add_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(add_id))
    };
    let sub_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(sub_id))
    };
    let mul_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(mul_id))
    };

    let handles: Vec<_> = (0..20).map(|thread_id| {
        thread::spawn(move || {
            for i in 0..5000 {
                let a = (thread_id * 1000 + i) as i64;
                let b = (i * 7) as i64;

                assert_eq!(add_fn(a, b), a + b, "add failed");
                assert_eq!(sub_fn(a, b), a - b, "sub failed");
                assert_eq!(mul_fn(a, b), a * b, "mul failed");
            }
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }

    println!("All tests passed!");
}

Cargo.toml:

[package]
name = "jit_repro"
version = "0.1.0"
edition = "2021"

[dependencies]
cranelift = { version = "0.125", features = ["jit", "module", "native"] }
cranelift-jit = "0.125"
cranelift-module = "0.125"
cranelift-codegen = "0.125"
cranelift-frontend = "0.125"
cranelift-native = "0.125"

Root Cause Analysis

Two issues combine to cause this:

1. Missing MAP_JIT flag

cranelift-jit allocates executable memory using the standard allocator (alloc::alloc), which doesn't set the MAP_JIT flag. On Apple Silicon, memory intended for JIT execution must be allocated with:

mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_JIT, -1, 0);

Without MAP_JIT (0x0800), the kernel cannot properly track the memory for W^X enforcement.

2. Missing W^X mode switch for spawned threads

Apple Silicon enforces W^X (Write XOR Execute) at the hardware level. Each thread has an independent write/execute mode:

Threads inherit write mode by default. The current implementation doesn't switch spawned threads to execute mode before calling JIT code, causing crashes.

Proposed Solution

  1. Use mmap with MAP_JIT for memory allocation on ARM64 macOS instead of the standard allocator
  2. Call pthread_jit_write_protect_np(1) after making memory executable to switch to execute mode
  3. Add memory barriers (DSB SY + ISB SY) for proper icache coherency on Apple Silicon's heterogeneous cores

Technical References

Related Issues

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:12):

darmie added the bug label to Issue #12076.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:12):

darmie added the cranelift label to Issue #12076.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:30):

bjorn3 commented on issue #12076:

Apple Silicon enforces W^X (Write XOR Execute) at the hardware level.

It is enforced by the kernel and only when hardened runtime is enabled. You don't need MAP_JIT when hardened runtime is disabled for an application, as is the default outside of App Store apps.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 05 2025 at 00:35):

alexcrichton commented on issue #12076:

I'm going to close this in favor of https://github.com/bytecodealliance/wasmtime/issues/11989 since I think it's a duplicate

view this post on Zulip Wasmtime GitHub notifications bot (Dec 05 2025 at 00:35):

alexcrichton closed issue #12076:

Summary

JIT-compiled code execution on ARM64 macOS (Apple Silicon) fails non-deterministically, with approximately 44% failure rate in multi-threaded scenarios. The failures manifest as SIGBUS, incorrect results, or silent corruption.

Environment

Observed Behavior

In a real-world JIT compiler (Rayzor - a Haxe-to-native compiler using cranelift-jit), we observe:

The failures manifest as:

Note on Minimal Reproduction

A simple test case may not reliably reproduce the issue because:

  1. The failure is non-deterministic and depends on timing, memory layout, and CPU scheduling
  2. Simple tests may not trigger the problematic code paths
  3. The issue is more likely with:

    • Complex JIT-compiled functions with multiple blocks
    • Multiple functions compiled together
    • Closures and captured variables passed to threads
    • Runtime library functions called from JIT code

Complex Reproduction (Rayzor Compiler)

The issue was observed and fixed in the Rayzor compiler, a Haxe-to-native compiler using cranelift-jit. The compiler:

E2E Test Case: compiler/examples/test_rayzor_stdlib_e2e.rs

Commits for testing:

# Test BEFORE fix (~56% success rate)
git clone https://github.com/darmie/rayzor
cd rayzor
git checkout 0eb9472
cargo build --release --package compiler --example test_rayzor_stdlib_e2e

# Run stability test
passed=0; failed=0
for i in {1..50}; do
    if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done
echo "Before fix - Passed: $passed/50, Failed: $failed/50"

# Test AFTER fix (100% success rate)
git checkout 9a0e80e
cargo build --release --package compiler --example test_rayzor_stdlib_e2e

passed=0; failed=0
for i in {1..50}; do
    if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done
echo "After fix - Passed: $passed/50, Failed: $failed/50"

Results:

Commit Configuration Success Rate
0eb9472 Upstream cranelift (no MAP_JIT) ~56% (28/50)
9a0e80e darmie/wasmtime fix-plt-aarch64 100% (50/50)

Simple Test Case (May Not Reliably Fail)

For reference, here's a minimal test that exercises the same code paths:

use cranelift::prelude::*;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{Linkage, Module, FuncId};
use std::thread;

fn define_function(module: &mut JITModule, name: &str, op: &str) -> FuncId {
    let mut sig = module.make_signature();
    sig.params.push(AbiParam::new(types::I64));
    sig.params.push(AbiParam::new(types::I64));
    sig.returns.push(AbiParam::new(types::I64));

    let func_id = module.declare_function(name, Linkage::Export, &sig).unwrap();

    let mut ctx = module.make_context();
    ctx.func.signature = sig;

    let mut builder_ctx = FunctionBuilderContext::new();
    {
        let mut builder = FunctionBuilder::new(&mut ctx.func, &mut builder_ctx);
        let block = builder.create_block();
        builder.append_block_params_for_function_params(block);
        builder.switch_to_block(block);
        builder.seal_block(block);

        let a = builder.block_params(block)[0];
        let b = builder.block_params(block)[1];

        let result = match op {
            "add" => builder.ins().iadd(a, b),
            "sub" => builder.ins().isub(a, b),
            "mul" => builder.ins().imul(a, b),
            _ => builder.ins().iadd(a, b),
        };

        builder.ins().return_(&[result]);
        builder.finalize();
    }

    module.define_function(func_id, &mut ctx).unwrap();
    module.clear_context(&mut ctx);

    func_id
}

fn main() {
    let mut flag_builder = settings::builder();
    flag_builder.set("use_colocated_libcalls", "false").unwrap();
    flag_builder.set("is_pic", "false").unwrap();
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder.finish(settings::Flags::new(flag_builder)).unwrap();

    let builder = JITBuilder::with_isa(isa, cranelift_module::default_libcall_names());
    let mut module = JITModule::new(builder);

    let add_id = define_function(&mut module, "add", "add");
    let sub_id = define_function(&mut module, "sub", "sub");
    let mul_id = define_function(&mut module, "mul", "mul");

    module.finalize_definitions().unwrap();

    let add_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(add_id))
    };
    let sub_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(sub_id))
    };
    let mul_fn: fn(i64, i64) -> i64 = unsafe {
        std::mem::transmute(module.get_finalized_function(mul_id))
    };

    let handles: Vec<_> = (0..20).map(|thread_id| {
        thread::spawn(move || {
            for i in 0..5000 {
                let a = (thread_id * 1000 + i) as i64;
                let b = (i * 7) as i64;

                assert_eq!(add_fn(a, b), a + b, "add failed");
                assert_eq!(sub_fn(a, b), a - b, "sub failed");
                assert_eq!(mul_fn(a, b), a * b, "mul failed");
            }
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }

    println!("All tests passed!");
}

Cargo.toml:

[package]
name = "jit_repro"
version = "0.1.0"
edition = "2021"

[dependencies]
cranelift = { version = "0.125", features = ["jit", "module", "native"] }
cranelift-jit = "0.125"
cranelift-module = "0.125"
cranelift-codegen = "0.125"
cranelift-frontend = "0.125"
cranelift-native = "0.125"

Root Cause Analysis

Two issues combine to cause this:

1. Missing MAP_JIT flag

cranelift-jit allocates executable memory using the standard allocator (alloc::alloc), which doesn't set the MAP_JIT flag. On Apple Silicon, memory intended for JIT execution must be allocated with:

mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_JIT, -1, 0);

Without MAP_JIT (0x0800), the kernel cannot properly track the memory for W^X enforcement.

2. Missing W^X mode switch for spawned threads

Apple Silicon enforces W^X (Write XOR Execute) at the hardware level. Each thread has an independent write/execute mode:

Threads inherit write mode by default. The current implementation doesn't switch spawned threads to execute mode before calling JIT code, causing crashes.

Proposed Solution

  1. Use mmap with MAP_JIT for memory allocation on ARM64 macOS instead of the standard allocator
  2. Call pthread_jit_write_protect_np(1) after making memory executable to switch to execute mode
  3. Add memory barriers (DSB SY + ISB SY) for proper icache coherency on Apple Silicon's heterogeneous cores

Technical References

Related Issues


Last updated: Dec 06 2025 at 06:05 UTC