Stream: git-wasmtime

Topic: wasmtime / PR #11592 Replace setjmp/longjmp usage in Wasm...


view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton opened PR #11592 from alexcrichton:trampoline-try-call to bytecodealliance:main:

Since Wasmtime's inception it's used the setjmp and longjmp
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in VMStoreContext so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a Handler are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as https://github.com/bytecodealliance/wasmtime/pull/11570, https://github.com/bytecodealliance/wasmtime/pull/11585,
and https://github.com/bytecodealliance/wasmtime/pull/11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
ucontext_t with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

sync/no-hook/core - host-to-wasm - typed - nop
            time:   [10.552 ns 10.561 ns 10.571 ns]
            change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
            Performance has improved.

Closes https://github.com/bytecodealliance/wasmtime/issues/3927
cc https://github.com/bytecodealliance/wasmtime/issues/10923

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton requested abrown for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton requested dicej for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton requested wasmtime-core-reviewers for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton requested wasmtime-default-reviewers for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:28):

alexcrichton commented on PR #11592:

Procedurally this is stacked on https://github.com/bytecodealliance/wasmtime/pull/11585, https://github.com/bytecodealliance/wasmtime/pull/11577, and https://github.com/bytecodealliance/wasmtime/pull/11576 at this time. I expect this to have a bit of a gauntlet on CI, however, so I wanted to get working on that sooner rather than later.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 19:54):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 22:16):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 22:56):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2025 at 23:04):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 14:34):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 14:38):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 16:06):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 17:45):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 17:45):

alexcrichton edited a comment on PR #11592:

Procedurally this is stacked on https://github.com/bytecodealliance/wasmtime/pull/11585, https://github.com/bytecodealliance/wasmtime/pull/11577, and https://github.com/bytecodealliance/wasmtime/pull/11576 at this time. I expect this to have a bit of a gauntlet on CI, however, so I wanted to get working on that sooner rather than later.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 20:59):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 20:59):

alexcrichton edited a comment on PR #11592:

Procedurally this is stacked on https://github.com/bytecodealliance/wasmtime/pull/11585, https://github.com/bytecodealliance/wasmtime/pull/11577, and https://github.com/bytecodealliance/wasmtime/pull/11576 at this time. I expect this to have a bit of a gauntlet on CI, however, so I wanted to get working on that sooner rather than later.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton created PR review comment:

For review this is particularly noteworthy, I had to enable this because when we compile for Winch now the entry trampolines (system ABI) are calling a Winch-defined function (winch ABI). The try_call being done requires that the callee ABI (here: Winch) needs to support exceptions.

I don't fully understand the implications of this flag, but I updated the aarch64/x64 backends to explicitly say that exceptions + winch clobbers all registers (like tail + winch). I'm not sure if there's more that needs be done, however.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton created PR review comment:

This is an example of trampolines getting significantly larger than before. It might worthwhile to invest some effort in making a new set of trampolines where we have something like all static entrypoints have their normal trampoline signature they have today, but internally they dispatch to a signature-specific trampoline which takes the function-to-call as a function pointer. That way we'd effectively deduplicate trampolines at least per-signature as we used to do.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton created PR review comment:

For review this is me from 3 years ago causing pain for myself. Years ago we switched from the winapi crate to the windows-sys crate. During that transition I found that the windows-sys crate, at the time, did not have EXCEPTION_CONTINUE_{SEARCH,EXECUTION} defined. It did, however, have ExceptionContinue{Search,Execution}. Being the naive little flower I am I assumed that this was some sort of mistake in the bindings and the values were all the same. Turns out this has always been wrong but it hasn't mattered since we practically never used ExceptionContinueExecution (only used for embedding-handled signals, which folks do sometimes on Linux but basically never on Windows).

Turns out that ExceptionContinue{Search,Execution} have different values than EXCEPTION_CONTINUE_{SEARCH,EXECUTION} and they're both interpreted as "continue search". It's basically a bit of a miracle this never broke before now. In this PR though we're actually using EXCEPTION_CONTINUE_EXECUTION which required this change.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton created PR review comment:

This is another consequence of this PR, the single-pass register allocator effectively no longer works with Wasmtime because it has issues with exceptions and Wasmtime unconditionally uses try_call for trampolines. This manifested in tests as crashes so I've disabled the usage in various tests.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:09):

alexcrichton created PR review comment:

This test is to double-check that we don't accidentally ever run exception handlers on traps now that we're almost implementing traps with exceptions.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:11):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 21:23):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2025 at 23:44):

github-actions[bot] commented on PR #11592:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html


<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 02:29):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 02:34):

alexcrichton has marked PR #11592 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 02:35):

alexcrichton requested fitzgen for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 02:35):

alexcrichton requested cfallin for a review on PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 21:00):

cfallin submitted PR review:

This is fantastic -- thanks for pushing toward the "unwinds only within Cranelift frames" goal!

Overall this all looks fine, with a few exceptions (no pun intended) around the handling of trap-handler addresses/relocations and the relocation hacks here. It's workable (and I left some comments on the MachBuffer changes below if we end up going this way) but I suspect there might be a simpler approach...

First: I think the handler address is always within the current function (there's an assert saying so), right? Then it seems like we shouldn't need to wait until we're using the text-section builder and working across functions to resolve the reference, channeling through the relocations mechanism; rather, it seems like we should be able to take the address of an exception handler target directly, and lower to an instruction/pair of instructions with the right LabelUses, never creating Relocs.

What I'm thinking is that an exception_handler_address CLIF operator, taking a block entity, could directly give us what we need. It's a property of the try_call lowering that we directly pass through the label and use that label to get an offset, so there is no "magic glue" between the try_call and the block -- in the lowering we can just get the block address directly by using the appropriate LabelUse(s).

That would remove the slightly brittle "take the one and only exception handler in this trampoline" logic, the special relocation and special FuncKey, the magic late-resolved labels and offset/addend stuff, and make things generally much cleaner I think.

What do you think?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 21:00):

cfallin created PR review comment:

The code in buffer.rs is a little idiosyncratic in that it has an embedded informal correctness proof, but at the risk of being a slightly annoying reviewer, I think it's worth extending the correctness arguments here to really make sure we get it right...

I was starting to write out a sketch of an argument that we can say "invariants don't matter when called from finish() but that can invoke forced veneer creation that then calls into much of the rest of the mechanisms here so I'm not super-confident actually. I think it's fine because optimize_branches is the step that makes the invariants load-bearing, and the text-section builder doesn't invoke that after finish like the VCode pipeline does here, but that's a fairly distributed and brittle argument. I think the fact that in practice the labels you bind late are not at the very end of the buffer may also be load-bearing here.

I think what I'd rather see here is that bind_label becomes bind_label_at_offset with its whole body intact (and add a surface bind_label that delegates), and we do the labels-at-tail updates and optimize-branches step only if offset == self.cur_offset(). What do you think?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 21:00):

cfallin created PR review comment:

This is only used in the verifier, so I think this is fine practically. Semantically, we more or less have defined exception throws in the Winch ABI by virtue of adding the clobber and payload definitions -- so we now "support" it, even though the Winch-the-compiler doesn't yet have lowerings for any of the opcodes.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 23:24):

alexcrichton commented on PR #11592:

What I'm thinking is that an exception_handler_address CLIF operator

I definitely agree this would be nicer, but can you sketch out more fully what you're thinking? For example this would need to come before a try_call and I'm not aware of anything else in CLIF where you'd want to refer to a label as a value (e.g. like label_addr block0 or something). I've been under the impression that adding something a bit more first-class in Cranelift would be a pretty big lift and not necessarily worth it for this case where a "special reloc" worked well enough (you're right though that there's no need to defer until the end of the MachBuffer building, I think it could be done sooner)

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 23:29):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 23:29):

alexcrichton created PR review comment:

To make sure I understand, you're saying revert this PR's changes, add an offset parameter to bind_label and only apply the optimizations if it's cur_offset, and then add a new method bind_label_at_current_offset as a helper for setting the offset parameter to self.cur_offset (or similarly bikeshedded)?

That all sounds reasonable to me yeah, I'd also be fine asserting that the labels are all empty since I think during text section creation we never hit those paths. If you're confident that running the optimizations is safe to only do conditionally though I'm happy to change to that too

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 23:34):

cfallin commented on PR #11592:

Right, it would more or less be that -- an operator that takes a Block as an argument; and we define its semantics to be: return the address of the handler that would be placed in exception-table metadata if this block were an exception handler. I suppose it also needs an ABI for that exception transfer, and we check the signature as we do for try_call. So something like

v1 = exception_handler_address winch block1

could appear anywhere in the function, and would lower to the ADRP/ADR, or LEA, or whatever is appropriate on the target.

I'm happy to whip this up tomorrow or late tonight if you like (about to disappear for the day right now)...

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 23:41):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 03:14):

alexcrichton commented on PR #11592:

Is it reasonable to expect Cranelift to always provide the guarantee that an exceptional entrypoint can be purely based on a block name alone? I would have expected that what we really want here is an identification of an edge, or a try_call plus block pair, as opposed to just the block. Otherwise if we're comfortable going just with blocks, how come exception_handler_address would be needed vs a bare v1 = block_addr block2?

I suppose put another way, if you wouldn't mind sketching this out I'd appreciate that. I'm having a bit of a block imagining what this would look like in CLIF but I fear I'm shackled by my current understanding of CLIF. This isn't that urgent though so don't feel the need to drop everything to sketch this out.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 05:32):

cfallin commented on PR #11592:

Is it reasonable to expect Cranelift to always provide the guarantee that an exceptional entrypoint can be purely based on a block name alone? I would have expected that what we really want here is an identification of an edge, or a try_call plus block pair, as opposed to just the block. Otherwise if we're comfortable going just with blocks, how come exception_handler_address would be needed vs a bare v1 = block_addr block2?

I suppose put another way, if you wouldn't mind sketching this out I'd appreciate that. I'm having a bit of a block imagining what this would look like in CLIF but I fear I'm shackled by my current understanding of CLIF. This isn't that urgent though so don't feel the need to drop everything to sketch this out.

So I dove down the surprisingly deep rabbithole here, at least for x64 (other backends left as an exercise for the reader...).

You're right that one actually needs to identify the edge, not the handler block; I ended up spec'ing the instruction to take a raw block entity reference to the callsite block (the one that ends with try_call) and an immediate indicating which out-edge. Then given that, the idea is it gives exactly the PC that the exception table gives for that handler edge.

Now that I wrote it out I'm not sure if I like it any better than the relocation-with-fake-external-FuncKey hack that you have here, though. It is more direct at least. I'll flesh it out for the other four backends and put up a PR if you'd like. Or if not, the one thing I would say about this PR is that, for avoidance of potential for subtle correctness bugs or unexpected coupling later, I'd prefer that we resolve the reloc eagerly after function compilation rather than add the deferred-label-resolution mechanism to the MachBuffer -- that's the part that sketches me out the most honestly. Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 14:21):

alexcrichton commented on PR #11592:

I'd prefer that we resolve the reloc eagerly after function compilation

For this I'd defintely be ok switching to this behavior, but I'm not sure how to implement this. From Cranelift's perspective it just sees a symbol_addr with some opaque symbol and it's later embedder-specific knowlege "oh yeah that symbol is in the function here". Given that the function is completely finished by the time it's learned what the label is bound to, so no matter what the MachBuffer would need to support "bind this label at this offset which isn't necessarily the current offset".

Were you thinking something else though? Or did you have an idea of how this would get plumbed to the MachBuffer during construction?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 14:30):

bjorn3 commented on PR #11592:

v1 = exception_handler_address winch block1

You could emulate this in the frontend as v1 = symbol_value gv1 where gv1 gets a relocation applied immediately after compilation based on the exception table Cranelift emits, right?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 15:24):

alexcrichton commented on PR #11592:

I believe that's more-or-less what this PR does already though. Supporting that requires a bind-this-label-at-this-offset mechanism in MachBuffer which it seems @cfallin would prefer to avoid

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2025 at 16:34):

cfallin commented on PR #11592:

Hmm, yeah, if we don't adopt the get_exception_handler_address approach from my branch above, I was imagining a more direct approach where we process the relocs directly as they come out of Cranelift -- the key is not using the deferred-label mechanism, because I am not actually sure that it is correct in all cases (it violates the invariants at least on paper, but right at the end of processing, but there are edge cases that could still possibly happen wrt island insertion).

Basically rather than converting the reloc to a label we would apply the reloc directly into the slice of function body. Given that we know the offset of the start of the function and the offset of the exception handler at that point we should have everything we need I think.

That said, I think it's probably best I finish other backends on my branch and put up a PR for the first-class operator instead -- it's a more straightforward design (even if a slightly heavy lift to add the new instruction format). Will do that shortly!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2025 at 22:13):

alexcrichton updated PR #11592.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2025 at 22:15):

alexcrichton commented on PR #11592:

Updated now to incorporate the work from https://github.com/bytecodealliance/wasmtime/pull/11629

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2025 at 23:54):

cfallin submitted PR review:

Looks great -- thanks for the patience!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 14:57):

alexcrichton merged PR #11592.


Last updated: Dec 06 2025 at 07:03 UTC