Hi! I was wondering if I could get a quick summary of the current state of winch's general production-readiness, at least for core wasm 1.0 instructions / excluding further extensions.
I see the aarch64 backend is still listed as lacking "complete impelmentation" in the tiers-of-support page -- though in tracking bugs like https://github.com/bytecodealliance/wasmtime/issues/8321 it looks like it's _pretty close_ to done? also last I checked in on it there was also some concern about the implementation possibly being too willing to panic on bad inputs. is this still a fair characterization of the status? any other place I ought to look for tracking status or to find things I could lend a hand with (I'll likely have some time to contribute soon)
cc @Saúl Cabrera
("pretty close to done" accords with my outside-observer view but Saúl would be able to say authoritatively!)
looks like aarch64 check_stack has a real TODO on it in the code so that matches the task list :)
which in turn looks a bit like it's waiting on something shaped like PatchableAddToReg for aarch64, hmm. that would be fun to write :)
Hi @Graydon Hoare , for Core Wasm 1.0 without any extensions:
x86_64 is ready, continously fuzzed and heavily tested with real production workloads (for our Wasm use-case at Shopify).
aarch64 is almost complete -- from a completness perspective, stack checks is the main remaining piece. There isn't anything fundamentally challenging about it, however, I've been testing the aarch64's correctness implementation by running Winch against each of the tests in the spec testsuite and fixing bugs as I find them. Once all of the suite is passing I'm planning to tackle the stack_checks piece, and with that the implementation will be complete for Core Wasm 1.0. I decided to go this route rather than jumping to implement stack checks before since with all test passing it'll be easier to verify the correctness of the stack checks implementation.
This is the canonical board for Winch's progress, which tracks all backends + all of the standard Wasm proposals https://github.com/orgs/bytecodealliance/projects/12/views/1
also last I checked in on it there was also some concern about the implementation possibly being too willing to panic on bad inputs.
I'm not entirely sure what do you mean by this. Is it https://github.com/bytecodealliance/wasmtime/issues/9566?
Even though the aarch64 is almost complete, our goal is to offer support for other backends as well (e.g., riscv64), and to add support for Wasm proposals as they are standardized. so I'd say there's plenty of room for contribution if any of those pieces are of interest.
they might be in the future but for near term I think we'd be focused on just aarch64 + x64
would you _like_ me to poke around at adding stack_max / check_stack / PatchableAdd to aarch64? it looks straightforward enough besides "needing to decide how to encode add-immediate for immediates larger than 12 bits" (I guess 1 or more mov imms + add but then I think the way it's set up there's no choice to decide that late so we'd have to just commit to it up front)
(unless, oh, hm, maybe also some unwind stuff..)
Yeah for sure, feel free to take a look. I haven't materially started on it, aside from brieftly thinking about how to go about the immediate encoding. And now that I think about it a bit more, you should be able to test the correctness of the implementation at least locally since the aarch64 implementation fully passes the call spec test suite. If you end up playing with the implementation, you could try to run it with cargo run -- wast -Ccompiler=winch tests/spec_testsuite/call.wast
ok!
I was actually poking around at this bit today for completely unrelated reasons, and I posted https://github.com/bytecodealliance/wasmtime/pull/10738 which adds an allow-list of known failures as well as a smaller list of known crashes/nondeterministic tests/etc. The hope is that those lists are the burn-down TODO lists remaining
Ah nice, thanks for this. I was about to update the issue with the tests that are expected to fail to keep the list more granular, but this is way better. If I'm understanding correctly, I think should_fail should also include call.wast? The test passes if the assert_exhaustion calls are ignored, but else it'll crash with a stack overflow.
aha it looks like the feature baseline for the main test suite includes simd and the simd feature is currently listed as "known to panic the compiler" for winch. If I remove the feature from "known panicking features" I get the test failure expected as it's not listed anywhere.
Given your refactoring I think I can remove the "known panicking" part for simd since it no longer panics and fails with an error instead, so that should update the lists
If I remember correctly, last I time I poked at this code I realized that the baseline also includes reference types https://github.com/bytecodealliance/wasmtime/blob/main/crates/test-util/src/wast.rs#L182, so we could probably remove ref types as panicking given that it shouldn't crash anymore.
implementation choice question: there are two obvious ways to do check_stack / PatchableAdd.
any preference?
(this also gets back to an earlier question about panics which I didn't answer: it seems to me that there is some amount of error-handling-by-panic in winch, eg. in this path we have a u32 max stack that's just converted to an i32 with try_from().unwrap(), and .. I am curious if there's an intent or desire to go through and find all the panic paths for rare-but-plausible bad inputs, and turn them into results. or perhaps all of this is impossible due to earlier implementation limits imposed by the parser/validator?)
A few thoughts on this:
I think that option 1 is probably the simplest, but it is limiting. I'm leaning toward 2, mostly because, aside from the overflow semantics defined by Wasm, we'd want to ensure that the stack checks are fully compliant with the Config::max_wasm_stack configuration, which currently accepts a usize.
For the additional register that's required here, luckily for aarch64 in WInch's default ABI we have two scratch registers available (x16, x17), so I was thinking that we could take advantage of that to simplify this situation.
Regarding panics, I think there are still some places in which panics _could_ happen, I know there are a couple of places in which we're still performing unwraps on numeric conversions. In some cases though, those unwraps are shielded by the validator, but I can't guarantee that this is true for all of them. So yes, I think there's value in ensuring that rare-but-plausible panics are converted into recoverable errors mainly for the ones that don't have a direct correlation with the Wasm validation.
I was poking a bit more at this code, since it's been a while, but more concretely regarding the panic scenario you described above, a tangential thought that I just had is that we could as well consider making sp_offset / max_sp u64 to better align with the configuration knob (which is pointer sized value when loaded from the current instance's context).
ok, next question (I think I'll come back to the u64-ness later) -- I notice the main instruction-emit path goes through the logic about possibly forming islands, and this one special PatchableAddToReg mostly doesn't use the main instruction-emit path but forms the instruction mostly by itself. I can't quite tell if this is because it wants to avoid the risk of an island in this critical place, or it wants to narrow the bit of the instruction that's patchable to a subset of the instruction so needs to mark the sub-instruction offset, or what.
in other words: I can't tell if in the aarch64 case -- where I'm not going to be slicing the bytes anyways, the field to patch is at a weird 5-bit offset in the instruction so I'm going to be rewriting the instruction in full when it's patched anyways -- I ought to manually assemble the instructions I want with bit patterns or reuse the instruciton encoding functions that cranelift provides. just like as a style thing; both options work but I don't want to waste your time reviewing something that's the opposite to how you'd want it.
like I can add a helper function Assembler::mov_imm that is more general-purpose and calls through to Assembler::emit(Inst::MovWide{...}) and so on; or I can just do buffer.put4(some bytes I just calculated). the bit pattern isn't exactly hard to write down.
(and the existing code seems to prefer the latter, again for reasons I'm not 100% certain on)
or, hm, perhaps I should split the difference and use enc_mov_wide / enc_movk. that seems like it'll work best.
eh, except the add still doesn't really like being done that way. how bad is "just slam some bits in here"?
ok, that's .. not too awful. I replicated one opcode's bit pattern: https://github.com/bytecodealliance/wasmtime/pull/10763
oof, I guess that doesn't pass a lot of tests though!
probably there is a contributing document I need to go read
As pointed by Alex, I think all the failures were due to the disassembly changes, since this addition will introduce a new code sequence to all of them. I'll take a look at your PR today, thanks!
I'm actually somewhat struggling to get this "done": I can't tell if I am getting the cmp backwards, or the codegen is wrong on certain tests, or what, but I still get lots of wast failures. is there an easy way to get a dump of the instructions generated for a wast test?
or like if I wanted to (I know, wild and crazy stuff) break inside the JIT'ed code in a debugger to take a look around, is that within the realm of possibility?
the wasmtime objdump subcommand can help you explore generate code and supports various options for filters/etc and what to display, although you'll have to copy out a module from the wast file itself as it doesn't work natively on a wast file
For debugging this is the line where we enter cranelift code, so one option is to break there and then step instruction-at-a-time.
Combined with https://github.com/bytecodealliance/wasmtime/pull/10780 you might be able to enable debuginfo generation to set a breakpoint on a line number in a wast file perhaps, but historically I've single-stepped into where I needed to go. The downside of single-stepping though is that it requires knowing when you're in/out of a trampoline which is not easy :(
I decided to approach this the crude way and just splat a BRK instruction in the code where I wanted the debugger to stop. easy to look around then!
I got confused about both the cmp encodings (which aarch64 is very fussy about, sp only works in the first operand; also Rn and Rm are the first and second operands even though m comes before n alphabetically) and also I thought the stack grew up in your JIT code for some reason, but it grows down like normal, so I had the comparison reversed.
so many footguns!
I think I have things sorted out now
shall I include this helper?
pub fn brk(&mut self) {
self.emit(Inst::Brk);
}
I've added your PR the merge queue, thanks!
Another approach that could come handy is to use the wasmtime explore command, which IMO, could give you very useful insights of the code emitted per Wasm instruction.
shall I include this helper?
We could leave it out for now I believe, the underlying code (e.g., self.emit(Inst::Brk) is small enough that I believe it could be inserted manually where needed.
Circling back to this thread. Given this announcement from August 2025, is it correct to say that Winch is potentially production-ready "for core wasm 1.0 instructions / excluding further extensions" on x86_64 and aarch64?
One small clarification: Winch is production ready (Tier 1) for x86_64 for Core Wasm 1.0 and near production ready (Tier 2) for aarch64 for Core Wasm 1.0 (see Wasmtime's Tiers of Support for more details on each tier)
Saúl Cabrera said:
One small clarification: Winch is production ready (Tier 1) for x86_64 for Core Wasm 1.0 and near production ready (Tier 2) for aarch64 for Core Wasm 1.0 (see Wasmtime's Tiers of Support for more details on each tier)
This is helpful. It might be worth updating these docs to say as much.
I forgot to come back to this: I've updated the docs in the strategy to remove the "experimental" side of it.
Last updated: Dec 06 2025 at 07:03 UTC