darmie opened PR #12077 from darmie:fix-plt-aarch64 to bytecodealliance:main:
Title
fix(jit): Add MAP_JIT support for 100% stable JIT execution on ARM64 macOS
Related Issue: https://github.com/bytecodealliance/wasmtime/issues/12076
Summary
This PR fixes non-deterministic JIT execution failures on Apple Silicon by implementing proper memory allocation and W^X mode handling required by the platform.
Before: ~56% success rate
After: 100% success rate (verified over 50+ consecutive runs)Problem
JIT-compiled code on ARM64 macOS fails non-deterministically in multi-threaded scenarios. Symptoms include:
Failure Mode Frequency SIGBUS on JIT function calls ~30% Silent incorrect results ~10% Segmentation fault ~4% Root Cause
Apple Silicon enforces W^X (Write XOR Execute) at the hardware level:
- Memory must be allocated with
MAP_JITflag so the kernel can track it for W^X enforcement- Threads must switch to execute mode via
pthread_jit_write_protect_np(1)before calling JIT code- Memory barriers are required due to independent instruction caches on P-cores and E-cores
The current implementation:
- Uses standard allocator (no
MAP_JIT)- Doesn't call
pthread_jit_write_protect_np()- Has insufficient memory barriers
Solution
Changes to
cranelift/jit/src/memory/system.rs
- New ARM64 macOS-specific
with_size()- UsesmmapwithMAP_JITflag:#[cfg(all(target_arch = "aarch64", target_os = "macos", not(feature = "selinux-fix")))] fn with_size(size: usize) -> io::Result<Self> { const MAP_JIT: libc::c_int = 0x0800; let ptr = unsafe { libc::mmap( ptr::null_mut(), alloc_size, libc::PROT_READ | libc::PROT_WRITE, libc::MAP_PRIVATE | libc::MAP_ANON | MAP_JIT, -1, 0, ) }; // ... }
- New ARM64 macOS-specific
Drop- Usesmunmapsince memory was allocated withmmap:#[cfg(all(target_arch = "aarch64", target_os = "macos", not(feature = "selinux-fix")))] impl Drop for PtrLen { fn drop(&mut self) { if !self.ptr.is_null() { unsafe { let _ = region::protect(self.ptr, self.len, region::Protection::READ_WRITE); libc::munmap(self.ptr as *mut libc::c_void, self.len); } } } }
- Memory barriers after finalize - Ensures icache coherency:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))] unsafe { std::arch::asm!("dsb sy", options(nostack, preserves_flags)); std::arch::asm!("isb sy", options(nostack, preserves_flags)); }Changes to
cranelift/jit/src/memory/mod.rs
- DSB SY before
clear_cache- Ensures data writes are visible before icache invalidation:#[cfg(all(target_arch = "aarch64", target_os = "macos"))] unsafe { std::arch::asm!("dsb sy", options(nostack, preserves_flags)); }
- Switch to execute mode - After making memory executable:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))] { unsafe extern "C" { fn pthread_jit_write_protect_np(enabled: libc::c_int); } unsafe { pthread_jit_write_protect_np(1); } }
- Final barriers - After protection change and mode switch:
#[cfg(all(target_arch = "aarch64", target_os = "macos"))] unsafe { std::arch::asm!("dsb sy", options(nostack, preserves_flags)); std::arch::asm!("isb sy", options(nostack, preserves_flags)); }Testing
Test Script
#!/bin/bash passed=0 failed=0 for i in {1..50}; do result=$(timeout 120 ./target/release/jit_test 2>&1) if echo "$result" | grep -q "All tests passed"; then echo "Run $i: PASSED" passed=$((passed+1)) else echo "Run $i: FAILED" failed=$((failed+1)) fi done echo "" echo "=== RESULTS ===" echo "Passed: $passed/50, Failed: $failed/50" echo "Success rate: $((passed * 100 / 50))%"Results
Tested with the Rayzor compiler's stdlib e2e test suite (50+ JIT-compiled runtime functions, multi-threaded):
Configuration Success Rate Before fix (standard allocator) ~56% (28/50) After fix (MAP_JIT + pthread_jit_write_protect_np) 100% (50/50) Note: Simple standalone tests may not reliably reproduce this issue. The failure is non-deterministic and depends on timing, memory layout, and CPU core scheduling (P-core vs E-core).
Platform Impact
- ARM64 macOS: Fixed (was broken)
- x86_64 macOS: No change (not affected)
- Linux (all arch): No change (not affected)
- Windows: No change (not affected)
All changes are gated behind
#[cfg(all(target_arch = "aarch64", target_os = "macos"))].Related Issues
- Fixes https://github.com/bytecodealliance/wasmtime/issues/12076
- Related to #2735 - Support PLT entries in
cranelift-jitcrate on aarch64- Related to #8852 - Cranelift: JIT assertion failure on macOS (A64)
- Related to #4000 - JIT relocations depend on system allocator behaviour
References
- Apple: Writing ARM64 Code for Apple Platforms
- Porting Just-In-Time Compilers to Apple Silicon
- MAP_JIT documentation
Notes for Reviewers
Thread safety consideration: The
pthread_jit_write_protect_np(1)call inmod.rshandles the compiling thread. Applications spawning threads that call JIT code must also ensure those threads are in execute mode. This could be:
- Documented as a requirement for users
- Handled via a helper function in the public API
Memory barriers: DSB SY + ISB SY may seem aggressive, but Apple Silicon's heterogeneous architecture (P-cores + E-cores with independent icaches) requires explicit synchronization.
Deallocation: Memory allocated with
mmapmust be freed withmunmap, hence the separateDropimplementation.Checklist
- [x] Code compiles without warnings
- [x] All existing tests pass
- [x] New functionality tested (50+ stability runs)
- [x] Changes are platform-specific (no impact on other platforms)
darmie requested abrown for a review on PR #12077.
darmie requested wasmtime-compiler-reviewers for a review on PR #12077.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Please revert this change. It is unrelated to the rest of the changes in this PR and incorrect. Arm64 doesn't support
ArgumentPurpose::StructArgumentbecause the arm64 ABI doesn't make use of it at all.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
The issue is not a missing barier before icache invalidation. The issue is that icache invalidation is currently only implemented on Linux and Windows arm64, not macOS arm64. A
dsb sydoesn't ensure icache coherence in multi threaded environments, while themembarrierwe use on Linux is already removes the need fordsb sythere.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
if target_lexicon::Architecture::Aarch64(_) = self.isa.triple().architecture {And probably just import
target_lexicon::Architecture.
bjorn3 created PR review comment:
All those PROBLEM/SOLUTION are unnecessarily verbose IMO.
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Same applies to the PR description, which is even worse. Did you use AI?
darmie submitted PR review.
darmie created PR review comment:
Thanks for the clarification @bjorn3 .. You are right that the DSB SY barriers alone don't solve icache coherence in multi-threaded scenarios. It actually didn't improve the results of my test case. The result only improved after adding the MAP_JIT flag and
pthread_jit_write_protect_np(1). I will run the test again without the barriers and update the PR
darmie updated PR #12077.
darmie submitted PR review.
darmie created PR review comment:
Same applies to the PR description, which is even worse. Did you use AI?
I was worried that my original description was too brief and lacking context :)
darmie updated PR #12077.
darmie requested wasmtime-default-reviewers for a review on PR #12077.
darmie updated PR #12077.
darmie updated PR #12077.
darmie edited PR #12077:
Title
fix(jit): Add MAP_JIT support for 100% stable JIT execution on ARM64 macOS
Related Issue: https://github.com/bytecodealliance/wasmtime/issues/12076
Summary
This PR fixes non-deterministic JIT execution failures on Apple Silicon by implementing proper memory allocation and W^X mode handling required by the platform.
Before: ~56% success rate
After: 100% success rate (verified over 50+ consecutive runs)Problem
JIT-compiled code on ARM64 macOS fails non-deterministically in multi-threaded scenarios. Symptoms include:
Failure Mode Frequency SIGBUS on JIT function calls ~30% Silent incorrect results ~10% Segmentation fault ~4% Root Cause
Apple Silicon enforces W^X (Write XOR Execute) at the hardware level:
- Memory must be allocated with
MAP_JITflag so the kernel can track it for W^X enforcement- Threads must switch to execute mode via
pthread_jit_write_protect_np(1)before calling JIT codeThe current implementation:
- Uses standard allocator (no
MAP_JIT)- Doesn't call
pthread_jit_write_protect_np()Solution
cranelift/jit/src/memory/system.rsAdded an ARM64 macOS-specific
PtrLen::with_size()implementation that usesmmapwith theMAP_JITflag (0x0800) instead of the standard allocator. This allows macOS to properly track the memory for W^X policy enforcement.Also added a corresponding
Dropimplementation that usesmunmapto deallocate the memory, since memory allocated withmmapcannot be freed with the standard allocator.
cranelift/jit/src/memory/mod.rsAfter making memory executable in
set_readable_and_executable(), added a call topthread_jit_write_protect_np(1)to switch the current thread to execute mode. This is required by Apple's W^X enforcement - threads must explicitly opt into execute mode before running JIT code.Testing
Test Script
#!/bin/bash passed=0 failed=0 for i in {1..50}; do result=$(timeout 120 ./target/release/jit_test 2>&1) if echo "$result" | grep -q "All tests passed"; then echo "Run $i: PASSED" passed=$((passed+1)) else echo "Run $i: FAILED" failed=$((failed+1)) fi done echo "" echo "=== RESULTS ===" echo "Passed: $passed/50, Failed: $failed/50" echo "Success rate: $((passed * 100 / 50))%"Results
Tested with the Rayzor compiler's stdlib e2e test suite (50+ JIT-compiled runtime functions, multi-threaded):
Configuration Success Rate Before fix (standard allocator) ~56% (28/50) After fix (MAP_JIT + pthread_jit_write_protect_np) 100% (50/50) Note: Simple standalone tests may not reliably reproduce this issue. The failure is non-deterministic and depends on timing, memory layout, and CPU core scheduling (P-core vs E-core).
Platform Impact
- ARM64 macOS: Fixed (was broken)
- x86_64 macOS: No change (not affected)
- Linux (all arch): No change (not affected)
- Windows: No change (not affected)
All changes are gated behind
#[cfg(all(target_arch = "aarch64", target_os = "macos"))].Related Issues
- Fixes https://github.com/bytecodealliance/wasmtime/issues/12076
- Related to #2735 - Support PLT entries in
cranelift-jitcrate on aarch64- Related to #8852 - Cranelift: JIT assertion failure on macOS (A64)
- Related to #4000 - JIT relocations depend on system allocator behaviour
darmie requested fitzgen for a review on PR #12077.
darmie requested wasmtime-core-reviewers for a review on PR #12077.
darmie updated PR #12077.
darmie requested bjorn3 for a review on PR #12077.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This comment is still correct outside macOS.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
You can use
libc::MAP_JIT.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Mind making these regular comments? They don't describe the public api of the function, but rather the implementation.
bjorn3 created PR review comment:
The
pthread_jit_write_protect_np(0)before writing to the mapped memory is missing, isn't it?
bjorn3 submitted PR review.
darmie submitted PR review.
darmie created PR review comment:
Oh yes, that's supposed to be in the with_size function.
darmie updated PR #12077.
darmie requested bjorn3 for a review on PR #12077.
alexcrichton commented on PR #12077:
@bjorn3 are you willing to take on review here as an owner of the cranelift-jit crate? I've written up my thoughts here on how this would all affect Wasmtime but this doesn't actually touch wasmtime except for the jit-icache-coherence crate, so my comment there is only tangentially applicable.
Last updated: Dec 06 2025 at 07:03 UTC