Hi all,
Long-time admirer of Cranelift here. I'm curious whether arm64_32 (the ILP32 ABI used by Apple Watch Series 4–8) is on the roadmap at all, or whether it's considered out of scope.
I'm working on a TypeScript-to-native compiler that targets Apple Watch via Cranelift, and arm64 (Series 9+) works great. The older watches are the only gap, and I'd love to know whether it's worth waiting for upstream support or if I should plan around it.
Thanks!
Hi! The short answer is that it definitely is not going to happen without someone contributing it.
The longer answer is that Cranelift is actively maintained by the ~3-5 of us who work fulltime on Wasmtime+Cranelift, but there is not really much active Cranelift work going on these days except when motivated by Wasmtime stuff (though some of us still have ideas we want to explore). There is definitely not a team with a "roadmap" of big features like new targets to implement -- we're stretched far too thin for that. So unfortunately there's no one here who will see this and add a 3-month project to their timeline, sorry. But PRs welcome!
Thanks for the answer, that makes sense :slight_smile:
@Ralph Küpper I have PRs up that contributes support for the interpreter. They're linked from this meta-issue I created: https://github.com/bytecodealliance/wasmtime/issues/13255
I've done a fair amount of benchmarking on Apple Watch 6/SE2, including targeting A12 (S8) and whole-program LTO. From my notes:
iPhone XS (A12) — Pulley wins: matmul SIMD (+90%), matmul FMA (only Pulley supports relaxed-simd), tail-call (+52%), convolution (+9%). WAMR wins: bulk_memory (2.2×), call_indirect (2×), audio_dsp (+28%), sieve (+26%).
Apple Watch SE2 (S8) — Pulley wins: matmul SIMD (2.4×), tail-call (+57%), sieve (+23%), fib (+25%). WAMR wins: bulk_memory (2×), call_indirect (+58%), audio_dsp (+39%).
Key shifts from M4 host:
because I'm targeting deployment on the App Store, I'm sticking to the interpreter and not working on the JIT aspect. (I could under paid contract, but I don't need it for my immediate purposes.)
@Matt Hargett interesting numbers -- could you clarify what the baseline is? e.g. in this
Pulley wins: matmul SIMD (+90%) [ ... ]
+90% over what? another interpreter?
Ah, I just saw WAMR in sibling thread? or WasmEdge?
against WAMR. I can add WasmEdge to my benchmarking app, since I know it works on arm64_32, if that's useful for the bytecode alliance / global community.
@Chris Fallin btw, versus your original IC branch I made the IC ARMv8-portable for non-Apple-silicon deploy targets, which appeared to fix a latent torn-pair race that the original IC had on weakly-ordered cores
I have all the proof points on my local hardware, I'm just needing some feedback about how you all would like me to proceed from here. an OK answer would be "we don't want this, please keep it in your fork" -- just lmk! :D
(I'm assuming these messages are replies to the neighboring "call_indirect optimization" topic -- replying as such)
btw, versus your original IC branch I made the IC ARMv8-portable for non-Apple-silicon deploy targets, which appeared to fix a latent torn-pair race that the original IC had on weakly-ordered cores
I don't think that's right (or, say more please!): my approach was to put the cache in the vmctx, which is locally owned by the running instance. Thus there cannot be any racy accesses because there is only one thread touching the state at a time. (When an instance is running it holds a &mut Store borrow; a store cannot run multiple threads)
On the other hand, your prototyped approach of caching targets in the bytecode by making it mutable is absolutely racy, and you'll run into issues as soon as you have more than once instance running in a multithreaded context. (The cache-related explanation is also somewhat dubious to me as explained)
So my advice remains: the state has to be stored in vmctx, not the bytecode, and if there are good wins with that, we'd be interested in taking it. Thanks!
haven't forgotten about this, just doing a bunch of benchmarking so I have defensible/credible data
Last updated: Jun 01 2026 at 09:49 UTC