Stream: winch

Topic: current status


view this post on Zulip Graydon Hoare (May 06 2025 at 22:49):

Hi! I was wondering if I could get a quick summary of the current state of winch's general production-readiness, at least for core wasm 1.0 instructions / excluding further extensions.

I see the aarch64 backend is still listed as lacking "complete impelmentation" in the tiers-of-support page -- though in tracking bugs like https://github.com/bytecodealliance/wasmtime/issues/8321 it looks like it's _pretty close_ to done? also last I checked in on it there was also some concern about the implementation possibly being too willing to panic on bad inputs. is this still a fair characterization of the status? any other place I ought to look for tracking status or to find things I could lend a hand with (I'll likely have some time to contribute soon)

Winch's support for Aarch64 is minimal. The objective of this issue to track the missing items in order to bring the Aarch64 MacroAssembler to parity with the x64 implementation. If you're interest...

view this post on Zulip Chris Fallin (May 06 2025 at 23:06):

cc @Saúl Cabrera

view this post on Zulip Chris Fallin (May 06 2025 at 23:07):

("pretty close to done" accords with my outside-observer view but Saúl would be able to say authoritatively!)

view this post on Zulip Graydon Hoare (May 06 2025 at 23:08):

looks like aarch64 check_stack has a real TODO on it in the code so that matches the task list :)

view this post on Zulip Graydon Hoare (May 06 2025 at 23:32):

which in turn looks a bit like it's waiting on something shaped like PatchableAddToReg for aarch64, hmm. that would be fun to write :)

view this post on Zulip Saúl Cabrera (May 07 2025 at 00:20):

Hi @Graydon Hoare , for Core Wasm 1.0 without any extensions:

x86_64 is ready, continously fuzzed and heavily tested with real production workloads (for our Wasm use-case at Shopify).

aarch64 is almost complete -- from a completness perspective, stack checks is the main remaining piece. There isn't anything fundamentally challenging about it, however, I've been testing the aarch64's correctness implementation by running Winch against each of the tests in the spec testsuite and fixing bugs as I find them. Once all of the suite is passing I'm planning to tackle the stack_checks piece, and with that the implementation will be complete for Core Wasm 1.0. I decided to go this route rather than jumping to implement stack checks before since with all test passing it'll be easier to verify the correctness of the stack checks implementation.

This is the canonical board for Winch's progress, which tracks all backends + all of the standard Wasm proposals https://github.com/orgs/bytecodealliance/projects/12/views/1

also last I checked in on it there was also some concern about the implementation possibly being too willing to panic on bad inputs.

I'm not entirely sure what do you mean by this. Is it https://github.com/bytecodealliance/wasmtime/issues/9566?

Tracking board for Winch
I've done some recent refactoring over the past month or so of the tests/wast.rs test suite. The general idea is that it should not only assert that passing tests pass but it should additionally as...

view this post on Zulip Saúl Cabrera (May 07 2025 at 00:23):

Even though the aarch64 is almost complete, our goal is to offer support for other backends as well (e.g., riscv64), and to add support for Wasm proposals as they are standardized. so I'd say there's plenty of room for contribution if any of those pieces are of interest.

view this post on Zulip Graydon Hoare (May 07 2025 at 00:28):

they might be in the future but for near term I think we'd be focused on just aarch64 + x64

view this post on Zulip Graydon Hoare (May 07 2025 at 00:31):

would you _like_ me to poke around at adding stack_max / check_stack / PatchableAdd to aarch64? it looks straightforward enough besides "needing to decide how to encode add-immediate for immediates larger than 12 bits" (I guess 1 or more mov imms + add but then I think the way it's set up there's no choice to decide that late so we'd have to just commit to it up front)

view this post on Zulip Graydon Hoare (May 07 2025 at 00:32):

(unless, oh, hm, maybe also some unwind stuff..)

view this post on Zulip Saúl Cabrera (May 07 2025 at 00:39):

Yeah for sure, feel free to take a look. I haven't materially started on it, aside from brieftly thinking about how to go about the immediate encoding. And now that I think about it a bit more, you should be able to test the correctness of the implementation at least locally since the aarch64 implementation fully passes the call spec test suite. If you end up playing with the implementation, you could try to run it with cargo run -- wast -Ccompiler=winch tests/spec_testsuite/call.wast

view this post on Zulip Graydon Hoare (May 07 2025 at 00:43):

ok!

view this post on Zulip Alex Crichton (May 07 2025 at 00:49):

I was actually poking around at this bit today for completely unrelated reasons, and I posted https://github.com/bytecodealliance/wasmtime/pull/10738 which adds an allow-list of known failures as well as a smaller list of known crashes/nondeterministic tests/etc. The hope is that those lists are the burn-down TODO lists remaining

Most tests pass (yay!), some tests fail, and some tests crash and/or have nondeterministic results. A new WastTest::should_skip_entirely helper is added to avoid running crashing or nondeterministi...

view this post on Zulip Saúl Cabrera (May 07 2025 at 01:02):

Ah nice, thanks for this. I was about to update the issue with the tests that are expected to fail to keep the list more granular, but this is way better. If I'm understanding correctly, I think should_fail should also include call.wast? The test passes if the assert_exhaustion calls are ignored, but else it'll crash with a stack overflow.

view this post on Zulip Alex Crichton (May 07 2025 at 01:06):

aha it looks like the feature baseline for the main test suite includes simd and the simd feature is currently listed as "known to panic the compiler" for winch. If I remove the feature from "known panicking features" I get the test failure expected as it's not listed anywhere.

Given your refactoring I think I can remove the "known panicking" part for simd since it no longer panics and fails with an error instead, so that should update the lists

view this post on Zulip Saúl Cabrera (May 07 2025 at 01:17):

If I remember correctly, last I time I poked at this code I realized that the baseline also includes reference types https://github.com/bytecodealliance/wasmtime/blob/main/crates/test-util/src/wast.rs#L182, so we could probably remove ref types as panicking given that it shouldn't crash anymore.

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Graydon Hoare (May 07 2025 at 06:46):

implementation choice question: there are two obvious ways to do check_stack / PatchableAdd.

  1. use the add-with-immediate aarch64 instruction which has a variant that's precise up to 12 bit immediates (4k stack frames) and also has a variant that does multiples-of-12-bit up to 12 bit (i.e. 24 bit = 16MiB stack frames, but coarser resolution). this would probably be fine for any real code, but would (a) have to overestimate large stack frames by as much as 4k and (b) not be able to handle truly gigantic stack frames between 2^24 and 2^32 bits in size
  2. use a more general sequence like movz+movk+add that loads 2 16-bit immediates and adds them. this makes a longer prologue and also chews up another temporary register during the prologue which I'm not sure we're in a particularly comfortable position to do?

view this post on Zulip Graydon Hoare (May 07 2025 at 06:47):

any preference?

view this post on Zulip Graydon Hoare (May 07 2025 at 06:57):

(this also gets back to an earlier question about panics which I didn't answer: it seems to me that there is some amount of error-handling-by-panic in winch, eg. in this path we have a u32 max stack that's just converted to an i32 with try_from().unwrap(), and .. I am curious if there's an intent or desire to go through and find all the panic paths for rare-but-plausible bad inputs, and turn them into results. or perhaps all of this is impossible due to earlier implementation limits imposed by the parser/validator?)

view this post on Zulip Saúl Cabrera (May 07 2025 at 14:22):

A few thoughts on this:

I think that option 1 is probably the simplest, but it is limiting. I'm leaning toward 2, mostly because, aside from the overflow semantics defined by Wasm, we'd want to ensure that the stack checks are fully compliant with the Config::max_wasm_stack configuration, which currently accepts a usize.

For the additional register that's required here, luckily for aarch64 in WInch's default ABI we have two scratch registers available (x16, x17), so I was thinking that we could take advantage of that to simplify this situation.

Regarding panics, I think there are still some places in which panics _could_ happen, I know there are a couple of places in which we're still performing unwraps on numeric conversions. In some cases though, those unwraps are shielded by the validator, but I can't guarantee that this is true for all of them. So yes, I think there's value in ensuring that rare-but-plausible panics are converted into recoverable errors mainly for the ones that don't have a direct correlation with the Wasm validation.

view this post on Zulip Saúl Cabrera (May 07 2025 at 15:09):

I was poking a bit more at this code, since it's been a while, but more concretely regarding the panic scenario you described above, a tangential thought that I just had is that we could as well consider making sp_offset / max_sp u64 to better align with the configuration knob (which is pointer sized value when loaded from the current instance's context).

view this post on Zulip Graydon Hoare (May 09 2025 at 21:15):

ok, next question (I think I'll come back to the u64-ness later) -- I notice the main instruction-emit path goes through the logic about possibly forming islands, and this one special PatchableAddToReg mostly doesn't use the main instruction-emit path but forms the instruction mostly by itself. I can't quite tell if this is because it wants to avoid the risk of an island in this critical place, or it wants to narrow the bit of the instruction that's patchable to a subset of the instruction so needs to mark the sub-instruction offset, or what.

in other words: I can't tell if in the aarch64 case -- where I'm not going to be slicing the bytes anyways, the field to patch is at a weird 5-bit offset in the instruction so I'm going to be rewriting the instruction in full when it's patched anyways -- I ought to manually assemble the instructions I want with bit patterns or reuse the instruciton encoding functions that cranelift provides. just like as a style thing; both options work but I don't want to waste your time reviewing something that's the opposite to how you'd want it.

view this post on Zulip Graydon Hoare (May 09 2025 at 21:18):

like I can add a helper function Assembler::mov_imm that is more general-purpose and calls through to Assembler::emit(Inst::MovWide{...}) and so on; or I can just do buffer.put4(some bytes I just calculated). the bit pattern isn't exactly hard to write down.

view this post on Zulip Graydon Hoare (May 09 2025 at 21:18):

(and the existing code seems to prefer the latter, again for reasons I'm not 100% certain on)

view this post on Zulip Graydon Hoare (May 09 2025 at 21:32):

or, hm, perhaps I should split the difference and use enc_mov_wide / enc_movk. that seems like it'll work best.

view this post on Zulip Graydon Hoare (May 09 2025 at 21:47):

eh, except the add still doesn't really like being done that way. how bad is "just slam some bits in here"?

view this post on Zulip Graydon Hoare (May 10 2025 at 02:34):

ok, that's .. not too awful. I replicated one opcode's bit pattern: https://github.com/bytecodealliance/wasmtime/pull/10763

This is my first attempt at contributing to winch or cranelift at all -- wide open to any feedback/suggestions. I was just poking around looking for things I could do to help push winch over the fi...

view this post on Zulip Graydon Hoare (May 10 2025 at 03:00):

oof, I guess that doesn't pass a lot of tests though!

view this post on Zulip Graydon Hoare (May 10 2025 at 03:00):

probably there is a contributing document I need to go read

view this post on Zulip Saúl Cabrera (May 12 2025 at 13:28):

As pointed by Alex, I think all the failures were due to the disassembly changes, since this addition will introduce a new code sequence to all of them. I'll take a look at your PR today, thanks!

view this post on Zulip Graydon Hoare (May 15 2025 at 03:33):

I'm actually somewhat struggling to get this "done": I can't tell if I am getting the cmp backwards, or the codegen is wrong on certain tests, or what, but I still get lots of wast failures. is there an easy way to get a dump of the instructions generated for a wast test?

view this post on Zulip Graydon Hoare (May 15 2025 at 03:49):

or like if I wanted to (I know, wild and crazy stuff) break inside the JIT'ed code in a debugger to take a look around, is that within the realm of possibility?

view this post on Zulip Alex Crichton (May 15 2025 at 05:48):

the wasmtime objdump subcommand can help you explore generate code and supports various options for filters/etc and what to display, although you'll have to copy out a module from the wast file itself as it doesn't work natively on a wast file

view this post on Zulip Alex Crichton (May 15 2025 at 05:52):

For debugging this is the line where we enter cranelift code, so one option is to break there and then step instruction-at-a-time.

Combined with https://github.com/bytecodealliance/wasmtime/pull/10780 you might be able to enable debuginfo generation to set a breakpoint on a line number in a wast file perhaps, but historically I've single-stepped into where I needed to go. The downside of single-stepping though is that it requires knowing when you're in/out of a trampoline which is not easy :(

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime
This commit adds support for the wast subcommand to generate DWARF debugging information for the text-to-binary translation that happens. This leans on the support within the wast crate already and...

view this post on Zulip Graydon Hoare (May 15 2025 at 06:40):

I decided to approach this the crude way and just splat a BRK instruction in the code where I wanted the debugger to stop. easy to look around then!

view this post on Zulip Graydon Hoare (May 15 2025 at 06:42):

I got confused about both the cmp encodings (which aarch64 is very fussy about, sp only works in the first operand; also Rn and Rm are the first and second operands even though m comes before n alphabetically) and also I thought the stack grew up in your JIT code for some reason, but it grows down like normal, so I had the comparison reversed.

so many footguns!

view this post on Zulip Graydon Hoare (May 15 2025 at 06:43):

I think I have things sorted out now

shall I include this helper?

    pub fn brk(&mut self) {
        self.emit(Inst::Brk);
    }

view this post on Zulip Saúl Cabrera (May 15 2025 at 13:29):

I've added your PR the merge queue, thanks!

Another approach that could come handy is to use the wasmtime explore command, which IMO, could give you very useful insights of the code emitted per Wasm instruction.

shall I include this helper?

We could leave it out for now I believe, the underlying code (e.g., self.emit(Inst::Brk) is small enough that I believe it could be inserted manually where needed.

view this post on Zulip David Fuelling (Oct 07 2025 at 20:12):

Circling back to this thread. Given this announcement from August 2025, is it correct to say that Winch is potentially production-ready "for core wasm 1.0 instructions / excluding further extensions" on x86_64 and aarch64?

Wasmtime is a fast, secure, standardscompliant and lightweight WebAssembly (Wasm) runtime.As of Wasmtime 35, Winch supports AArch64 for CoreWasmproposals, along with additional Wasm proposals like the ComponentModel and Custom PageSizes.

view this post on Zulip Saúl Cabrera (Oct 07 2025 at 20:19):

One small clarification: Winch is production ready (Tier 1) for x86_64 for Core Wasm 1.0 and near production ready (Tier 2) for aarch64 for Core Wasm 1.0 (see Wasmtime's Tiers of Support for more details on each tier)

view this post on Zulip David Fuelling (Oct 21 2025 at 17:56):

Saúl Cabrera said:

One small clarification: Winch is production ready (Tier 1) for x86_64 for Core Wasm 1.0 and near production ready (Tier 2) for aarch64 for Core Wasm 1.0 (see Wasmtime's Tiers of Support for more details on each tier)

This is helpful. It might be worth updating these docs to say as much.

view this post on Zulip Saúl Cabrera (Nov 01 2025 at 17:35):

I forgot to come back to this: I've updated the docs in the strategy to remove the "experimental" side of it.


Last updated: Dec 06 2025 at 07:03 UTC