Stream: git-wasmtime

Topic: wasmtime / PR #12077 fix(jit): Add MAP_JIT support for 10...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:16):

darmie opened PR #12077 from darmie:fix-plt-aarch64 to bytecodealliance:main:

Title

fix(jit): Add MAP_JIT support for 100% stable JIT execution on ARM64 macOS

Related Issue: https://github.com/bytecodealliance/wasmtime/issues/12076

Summary

This PR fixes non-deterministic JIT execution failures on Apple Silicon by implementing proper memory allocation and W^X mode handling required by the platform.

Before: ~56% success rate
After: 100% success rate (verified over 50+ consecutive runs)

Problem

JIT-compiled code on ARM64 macOS fails non-deterministically in multi-threaded scenarios. Symptoms include:

Failure Mode Frequency
SIGBUS on JIT function calls ~30%
Silent incorrect results ~10%
Segmentation fault ~4%

Root Cause

Apple Silicon enforces W^X (Write XOR Execute) at the hardware level:

  1. Memory must be allocated with MAP_JIT flag so the kernel can track it for W^X enforcement
  2. Threads must switch to execute mode via pthread_jit_write_protect_np(1) before calling JIT code
  3. Memory barriers are required due to independent instruction caches on P-cores and E-cores

The current implementation:

Solution

Changes to cranelift/jit/src/memory/system.rs

  1. New ARM64 macOS-specific with_size() - Uses mmap with MAP_JIT flag:
#[cfg(all(target_arch = "aarch64", target_os = "macos", not(feature = "selinux-fix")))]
fn with_size(size: usize) -> io::Result<Self> {
    const MAP_JIT: libc::c_int = 0x0800;
    let ptr = unsafe {
        libc::mmap(
            ptr::null_mut(),
            alloc_size,
            libc::PROT_READ | libc::PROT_WRITE,
            libc::MAP_PRIVATE | libc::MAP_ANON | MAP_JIT,
            -1, 0,
        )
    };
    // ...
}
  1. New ARM64 macOS-specific Drop - Uses munmap since memory was allocated with mmap:
#[cfg(all(target_arch = "aarch64", target_os = "macos", not(feature = "selinux-fix")))]
impl Drop for PtrLen {
    fn drop(&mut self) {
        if !self.ptr.is_null() {
            unsafe {
                let _ = region::protect(self.ptr, self.len, region::Protection::READ_WRITE);
                libc::munmap(self.ptr as *mut libc::c_void, self.len);
            }
        }
    }
}
  1. Memory barriers after finalize - Ensures icache coherency:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))]
unsafe {
    std::arch::asm!("dsb sy", options(nostack, preserves_flags));
    std::arch::asm!("isb sy", options(nostack, preserves_flags));
}

Changes to cranelift/jit/src/memory/mod.rs

  1. DSB SY before clear_cache - Ensures data writes are visible before icache invalidation:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))]
unsafe {
    std::arch::asm!("dsb sy", options(nostack, preserves_flags));
}
  1. Switch to execute mode - After making memory executable:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))]
{
    unsafe extern "C" {
        fn pthread_jit_write_protect_np(enabled: libc::c_int);
    }
    unsafe {
        pthread_jit_write_protect_np(1);
    }
}
  1. Final barriers - After protection change and mode switch:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))]
unsafe {
    std::arch::asm!("dsb sy", options(nostack, preserves_flags));
    std::arch::asm!("isb sy", options(nostack, preserves_flags));
}

Testing

Test Script

#!/bin/bash
passed=0
failed=0

for i in {1..50}; do
    result=$(timeout 120 ./target/release/jit_test 2>&1)
    if echo "$result" | grep -q "All tests passed"; then
        echo "Run $i: PASSED"
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done

echo ""
echo "=== RESULTS ==="
echo "Passed: $passed/50, Failed: $failed/50"
echo "Success rate: $((passed * 100 / 50))%"

Results

Tested with the Rayzor compiler's stdlib e2e test suite (50+ JIT-compiled runtime functions, multi-threaded):

Configuration Success Rate
Before fix (standard allocator) ~56% (28/50)
After fix (MAP_JIT + pthread_jit_write_protect_np) 100% (50/50)

Note: Simple standalone tests may not reliably reproduce this issue. The failure is non-deterministic and depends on timing, memory layout, and CPU core scheduling (P-core vs E-core).

Platform Impact

All changes are gated behind #[cfg(all(target_arch = "aarch64", target_os = "macos"))].

Related Issues

References

Notes for Reviewers

  1. Thread safety consideration: The pthread_jit_write_protect_np(1) call in mod.rs handles the compiling thread. Applications spawning threads that call JIT code must also ensure those threads are in execute mode. This could be:

    • Documented as a requirement for users
    • Handled via a helper function in the public API
  2. Memory barriers: DSB SY + ISB SY may seem aggressive, but Apple Silicon's heterogeneous architecture (P-cores + E-cores with independent icaches) requires explicit synchronization.

  3. Deallocation: Memory allocated with mmap must be freed with munmap, hence the separate Drop implementation.

Checklist

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:16):

darmie requested abrown for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:16):

darmie requested wasmtime-compiler-reviewers for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:19):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:19):

bjorn3 created PR review comment:

Please revert this change. It is unrelated to the rest of the changes in this PR and incorrect. Arm64 doesn't support ArgumentPurpose::StructArgument because the arm64 ABI doesn't make use of it at all.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:23):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:23):

bjorn3 created PR review comment:

The issue is not a missing barier before icache invalidation. The issue is that icache invalidation is currently only implemented on Linux and Windows arm64, not macOS arm64. A dsb sy doesn't ensure icache coherence in multi threaded environments, while the membarrier we use on Linux is already removes the need for dsb sy there.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:26):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:26):

bjorn3 created PR review comment:

        if target_lexicon::Architecture::Aarch64(_) = self.isa.triple().architecture {

And probably just import target_lexicon::Architecture.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:26):

bjorn3 created PR review comment:

All those PROBLEM/SOLUTION are unnecessarily verbose IMO.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:26):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:27):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:27):

bjorn3 created PR review comment:

Same applies to the PR description, which is even worse. Did you use AI?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:41):

darmie submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 13:41):

darmie created PR review comment:

Thanks for the clarification @bjorn3 .. You are right that the DSB SY barriers alone don't solve icache coherence in multi-threaded scenarios. It actually didn't improve the results of my test case. The result only improved after adding the MAP_JIT flag and pthread_jit_write_protect_np(1). I will run the test again without the barriers and update the PR

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:03):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:10):

darmie submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:10):

darmie created PR review comment:

Same applies to the PR description, which is even worse. Did you use AI?

I was worried that my original description was too brief and lacking context :)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:16):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:16):

darmie requested wasmtime-default-reviewers for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:18):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:23):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:30):

darmie edited PR #12077:

Title

fix(jit): Add MAP_JIT support for 100% stable JIT execution on ARM64 macOS

Related Issue: https://github.com/bytecodealliance/wasmtime/issues/12076

Summary

This PR fixes non-deterministic JIT execution failures on Apple Silicon by implementing proper memory allocation and W^X mode handling required by the platform.

Before: ~56% success rate
After: 100% success rate (verified over 50+ consecutive runs)

Problem

JIT-compiled code on ARM64 macOS fails non-deterministically in multi-threaded scenarios. Symptoms include:

Failure Mode Frequency
SIGBUS on JIT function calls ~30%
Silent incorrect results ~10%
Segmentation fault ~4%

Root Cause

Apple Silicon enforces W^X (Write XOR Execute) at the hardware level:

  1. Memory must be allocated with MAP_JIT flag so the kernel can track it for W^X enforcement
  2. Threads must switch to execute mode via pthread_jit_write_protect_np(1) before calling JIT code

The current implementation:

Solution

cranelift/jit/src/memory/system.rs

Added an ARM64 macOS-specific PtrLen::with_size() implementation that uses mmap with the MAP_JIT flag (0x0800) instead of the standard allocator. This allows macOS to properly track the memory for W^X policy enforcement.

Also added a corresponding Drop implementation that uses munmap to deallocate the memory, since memory allocated with mmap cannot be freed with the standard allocator.

cranelift/jit/src/memory/mod.rs

After making memory executable in set_readable_and_executable(), added a call to pthread_jit_write_protect_np(1) to switch the current thread to execute mode. This is required by Apple's W^X enforcement - threads must explicitly opt into execute mode before running JIT code.

Testing

Test Script

#!/bin/bash
passed=0
failed=0

for i in {1..50}; do
    result=$(timeout 120 ./target/release/jit_test 2>&1)
    if echo "$result" | grep -q "All tests passed"; then
        echo "Run $i: PASSED"
        passed=$((passed+1))
    else
        echo "Run $i: FAILED"
        failed=$((failed+1))
    fi
done

echo ""
echo "=== RESULTS ==="
echo "Passed: $passed/50, Failed: $failed/50"
echo "Success rate: $((passed * 100 / 50))%"

Results

Tested with the Rayzor compiler's stdlib e2e test suite (50+ JIT-compiled runtime functions, multi-threaded):

Configuration Success Rate
Before fix (standard allocator) ~56% (28/50)
After fix (MAP_JIT + pthread_jit_write_protect_np) 100% (50/50)

Note: Simple standalone tests may not reliably reproduce this issue. The failure is non-deterministic and depends on timing, memory layout, and CPU core scheduling (P-core vs E-core).

Platform Impact

All changes are gated behind #[cfg(all(target_arch = "aarch64", target_os = "macos"))].

Related Issues

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:58):

darmie requested fitzgen for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:58):

darmie requested wasmtime-core-reviewers for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:58):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 14:59):

darmie requested bjorn3 for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:48):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:48):

bjorn3 created PR review comment:

This comment is still correct outside macOS.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:52):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:52):

bjorn3 created PR review comment:

You can use libc::MAP_JIT.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:53):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:53):

bjorn3 created PR review comment:

Mind making these regular comments? They don't describe the public api of the function, but rather the implementation.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:55):

bjorn3 created PR review comment:

The pthread_jit_write_protect_np(0) before writing to the mapped memory is missing, isn't it?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 15:55):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 16:12):

darmie submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 16:12):

darmie created PR review comment:

Oh yes, that's supposed to be in the with_size function.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 16:20):

darmie updated PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 24 2025 at 16:21):

darmie requested bjorn3 for a review on PR #12077.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 05 2025 at 00:36):

alexcrichton commented on PR #12077:

@bjorn3 are you willing to take on review here as an owner of the cranelift-jit crate? I've written up my thoughts here on how this would all affect Wasmtime but this doesn't actually touch wasmtime except for the jit-icache-coherence crate, so my comment there is only tangentially applicable.


Last updated: Dec 06 2025 at 07:03 UTC