https://abacusnoir.com/2026/04/18/zero-copy-gpu-inference-from-webassembly-on-apple-silicon/
Link 3: Wasmtime lets you bring your own allocator. Wasmtime's
MemoryCreatortrait lets you control how linear memory is allocated. Instead of letting Wasmtime callmmapinternally, you provide the backing memory yourself. I implementMemoryCreatorto return our ownmmapregion, and Wasmtime'smemory.data_ptr()returns exactly the pointer I handed it. The Wasm module reads and writes through Wasmtime's memory API; the GPU reads and writes through the Metal buffer; both are operating on the same bytes.The composition: allocate an
mmapregion, hand it to both Wasmtime (as the actor's linear memory) and Metal (as a GPU buffer). The Wasm module writes data at known offsets, the GPU computes on it in place, and the results appear in the module's linear memory with no copies and no explicit data transfer.
That said, it sure looks to me like this dumps some boundaries we really want.
https://thepixelspulse.com/posts/zero-copy-wasm-apple-silicon-gpu-inference/ for a shoutout to the component model: "The Future: WebAssembly Component Model and True Zero-Copy"
Wouldn't be too surprised if that allowed a sandbox escape if the linear memory isn't marked as shared memory on the wasm side.
Last updated: May 03 2026 at 22:13 UTC