alexcrichton commented on PR #36:
Chris, Nick, and I talked about this in-person at the CG meeting over dinner yesterday so I wanted to drop some notes here from what we talked about so we don't forget:
- We probably don't actually want to do the "set flags on exception" ABI. That means it's (a) a new ABI to maintain and (b) we can't ever turn it on by default. In some sense "wasted work" but also it's a lot more to maintain over time.
- To assist with unwinding we say that an "invoke" will always spill all registers (everything is caller-saved) initially. That way we don't have to recover registers during unwinding.
- We won't remove setjmp/longjmp yet, so that'll stay for traps (although it would be good to do it with instructions in Cranelift instead one day)
- As a first stepping stone it might be interesting to add a mode where we just trap on throws. Non-exceptional programs could then be run/benchmarked which might help some early users dogfood.
@dhil is it okay if I take over this RFC to get it across the finish line?
@dhil is it okay if I take over this RFC to get it across the finish line?
Yes absolutely!
fitzgen updated PR #36.
Just pushed a pretty major overhaul to this RFC, incorporating changes from this discussion and others:
try_call
now have multiple exception paths, likebr_table
- no more calling convention stuff, just side tables
- and more
Please take another look!
programmerjake submitted PR review.
programmerjake created PR review comment:
"this just not" -- typo?
alexcrichton submitted PR review:
Thanks for taking this on @fitzgen, it's looking very good to me :+1:
alexcrichton created PR review comment:
For the option (3) vs (4) category I'd vote for (4) and would say that we can alleviate this problem by we pretty much have a single location that converts
Result<T>
intoT
since as the host we also have to do things likecatch_unwind
which we're doing there as well. Given that I'd vote for (4) as it's less churn in general.
alexcrichton created PR review comment:
To confirm, the thinking here is that "longjmp" is a standin for "restore the stack and jump to a pc", probably via an assembly stub, right? Basically (a) not
longjmp
itself, (b) not a cranelift-synthesized stub, and (c) simple enough that we can maintain handwritten assembly
cfallin submitted PR review:
Thanks for putting this together; the details all basically look good to me! Just one uncertainty/question below.
cfallin created PR review comment:
(I'm assuming the two-phase exceptions here refer to this family of ideas)
Is the transfer itself happening, or only the stack-walker's visit of the metadata at this site? In other words, does the catch itself happen multiple times? This is surprising to me (maybe I need to read more if so!).
If the actual execution path is taken more than once, this is workable with our all-regs-clobbered approach, but it's worth noting that it will complicate regalloc otherwise: parallel moves on edges (implementing blockparams) are free to assume that they will execute only once, and swap/rotate as needed. Or in other words, a true "multiple execution" here is actually implying another (back)-edge that we're not encoding, which is problematic. Hopefully I'm reading too much into this, is all!
cfallin submitted PR review:
A few more thoughts on another read-through. Two things to clarify / add more detail on; no issues with the approach at all.
cfallin created PR review comment:
One thing I've just realized is not discussed explicitly here, and would be good to write down, is how the decision to attach the
try
primitive to calls interacts with the fact thattry
-blocks are arbitrary subregions in the Wasm structured control flow.This is a well-supported design to be clear, not least because it has the existence proof of LLVM's approach; I only think we should write down a little bit here. Something like:
- Because throws are not a CLIF instruction, they are necessarily a call out to some host primitive;
- Thus, all exception-raises happen when active CLIF functions on the stack are at callsites (either directly to that primitive, or indirectly to some other function that eventually reaches that primitive);
- Thus, it is sufficient to attach handler information only to callsites.
The compilation strategy for a Wasm function body is then to track the active handlers for the try block(s) containing any given callsite as we translate to CLIF, and attach that list of handlers to each callsite as we translate it (with inner blocks for the same tag shadowing outer blocks as appropriate). We use an
invoke
whenever we're inside at least onetry
block, and acall
otherwise (then the unwinder passes over us).On that last point: we should also note explicitly that the unwinder will pass over frames at callsites with ordinary
call
(orcall_indirect
orreturn_call[_indirect]
) instructions; there is no need to always useinvoke
when an exception may be thrown somewhere down the stack. (This may follow straightforwardly from ordinary exception semantics for those in-the-know but isn't obvious at first, IMHO!) This is what permits a function-local compilation strategy to continue to work without e.g. exceptions in signatures; and it is possible to implement in Cranelift with our clobber-save scheme because we don't care about saving regs of a frame we won't restore to (when any frame above it that catches will have all of its regs saved).
cfallin created PR review comment:
It's probably worth noting here that we explicitly do not have
try_
variants of the tailcall instructions (return_call[_indirect]
) because they remove the current frame from the stack, so it certainly can't expect to catch anything (!). Pretty clearly follows from that but not obvious at first if one is only looking at the different kinds of calls.
programmerjake submitted PR review.
programmerjake created PR review comment:
so, for translation from wasm exceptions, do all exceptions translate to the same tag? if not, how do you handle the
catch_all
orcatch_all_ref
instructions?
cfallin edited PR review comment.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
To comment on this specifically, we talked about this in the Cranelift meeting today and I filed https://github.com/bytecodealliance/wasmtime/issues/10336 which is certainly extra work but I think is "The Answer" for how to recover something that's
dwarfdump
-like with our own custom format
fitzgen submitted PR review.
fitzgen created PR review comment:
C++ exceptions (and I assume Rust panics as well?) are two-phase. That link is a bit tree- rather than forest-focused, but I think this gives a decent, concise overview of C++'s two-phase implementation: https://nicolasbrailo.github.io/blog/2013/0326_Cexceptionsunderthehood8twophasehandling.html
If the first phase (search) always happened purely in the runtime based on the metadata in the side tables and only the second phase (cleanup) actually jumped to the landing pads, then we would only execute them once and I think we would be fine. That may be incompatible with reusing the existing system unwinder in
cg_clif
, but that might be the situation anyways, I'm not sure. cc @bjorn3But yeah, I suspect this is probably why LLVM denotes landing pads as their own kind of block, rather than making them uniform with regular blocks: landing pads are essentially weird secondary function entry points whose transitive closure can't overwrite any (live) stack slots from the main function because they might be used upon resumption (catch) or read in the next invocation of the landing pad and all that stuff you mentioned above, and therefore it makes sense to flag them as different because they require special handling.
shrug
Kind of feel like maybe we should just ignore all this for now...
fitzgen submitted PR review.
fitzgen created PR review comment:
we pretty much have a single location that converts
Result<T>
intoT
I meant guest code, not VM code. All this discussion is in the context of, e.g., what kind of result should we return from
Func::call
. While our VM code is fixed and we can make claims like "there is one place that cares about X", guest code is an open world where we don't know how often people want to check for, handle, or automatically propagate exception results fromFunc::call
. If we knew that answer, then choosing between (3) and (4) would be straightforward. But we don't yet, and it is also somewhat future-looking, so I am not sure how to proceed.
fitzgen submitted PR review.
fitzgen created PR review comment:
No, sorry if this isn't clear: this sentence is referring to the existing
longjmp
s we have for panics and traps that skip over Wasm code. Thatlongjmp
path is mentioned just as a path we would _skip_ if there is an exception (instead of a panic/trap that we need to propagate over Wasm frames).Eventually we want to have inline Cranelift stubs to replace that, but that is orthogonal to this RFC, as mentioned in the footnote.
fitzgen submitted PR review.
fitzgen created PR review comment:
We should be able to reserve a sentinel value for "all".
fitzgen edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Hmm, yeah, OK, I had been assuming that the search phase would happen purely during the unwinder's stack-walk and table lookups, and that there would not be any invocation of Cranelift-compiled code until we find the handling frame and unwind to that point / jump to the handler. I was worried that the Wasm-specific discussion might imply a mechanism that explicitly exposes the phases in the future; but the current proposal doesn't have anything like that, it seems.
I've looked at the LLVM EH reference where it says "LLVM landing pads are conceptually alternative function entry points where an exception structure reference and a type info index are passed in as arguments." I kind of wish they had gone into more depth why they need to handle things this way, rather than have an ordinary control-flow edge out of the
invoke
...Anyway, I agree: let's design to Wasm exceptions, and we don't need multiple-shot resume here, so let's assume that the search phase can be done fully in the unwinder/runtime based on tables. If we need multiple-shot resume that's weird enough that we would need really a fully audit of the compiler/regalloc and a lot more careful thought than is in scope here IMHO!
cfallin edited PR review comment.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Some more background on the LLVM side of things might be SEH on Windows. Circa 2015 IIRC LLVM only supported
invoke
and dwarf-based unwinding and then things changed quite a lot once SEH got implemented, so much so that brand new constructs were added to LLVM IR for SEH. Personally I know very little of SEH, as I suspect most others do too, but I know it's different enough from dwarf unwinding that there's probably not a one-size-fits-all low-level abstraction.Anyway that's just possible background to LLVM, I don't mean to detract from the conclusion here of focusing on wasm exceptions, which I personally very much agree with :+1:
programmerjake submitted PR review.
programmerjake created PR review comment:
ok, so then cranelift most likely will need to support a wildcard exception tag, this should probably be added to the rfc
fitzgen submitted PR review.
fitzgen created PR review comment:
This shouldn't require anything from Cranelift, the semantics of individual
ir::ExceptionTag
s and which one is the wildcard sentinel can all happen at the Wasmtime level. The fact that a particularir::ExceptionTag
is treated as a wildcard by Wasmtime should not have any affect on what Cranelift does.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
For C++ two-phase exceptions, the search phase does not run landingpads. Only once the frame that catches the exception has been determined does the stack get unwound and landingpads executed. During the search phase the personality function does run, but the personality function is called by the unwinding runtime like any other regular function. And the personality function normally doesn't need need to touch any registers or the stack during the search phase (unless you the compiler which the personality function is associated with requires this, but neither GCC nor LLVM require this and Cranelift should probably mirror this.)
bjorn3 edited PR review comment.
fitzgen submitted PR review.
fitzgen created PR review comment:
Thanks, fixed.
fitzgen updated PR #36.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
But yeah, I suspect this is probably why LLVM denotes landing pads as their own kind of block, rather than making them uniform with regular blocks: landing pads are essentially weird secondary function entry points whose transitive closure can't overwrite any (live) stack slots from the main function because they might be used upon resumption (catch) or read in the next invocation of the landing pad and all that stuff you mentioned above, and therefore it makes sense to flag them as different because they require special handling.
I think it is just because a landingpad often itself has arguments to indicate what exception was thrown. The landingpad arguments generally have different types from the regular return value. An invoke can't call a landingpad twice.
fitzgen updated PR #36.
fitzgen edited PR review comment.
programmerjake submitted PR review.
programmerjake created PR review comment:
ah, ok. I thought the unwinder would be a cranelift thing.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Guest profiling doesn't need stack unwinding, only stack walking, right? The difference between the two is that stack unwinding will run compiled code for each stack frame (and generally destroy the stack frame in the process) while stack walking just iterates over all stack frames and gets the instruction pointer. Stack walking can be done with either frame pointers or unwind tables, while stack unwinding needs more compiler cooperation. Cranelift already supports stack walking.
bjorn3 edited PR review comment.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
SEH and
.eh_frame
use fundamentally different approaches that I don't think are easy to unify at the IR level..eh_frame
uses landingpads where execution continues in the context of the stack frame that you unwind into. This way you can for example return from the exception handler like normal. SEH however uses funclets which are functions that get the stack pointer of the stack frame as argument. Funclets can't return from the stack frame on which they get invoked. Catching exceptions requires cooperation with the filter function.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Two phase exceptions doesn't run langing pads multiple times. Only the personality function.
bjorn3 edited PR review comment.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
s/three/two?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
and where their respective handlers are located.
The format for this is not specified by DWARF. Each language runtime is expected to define their own format. The personality function exists to interpret this format. Cranelift should probably emit the information necessary for consumers to emit the side table in whatever format they need.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
*associated
bjorn3 submitted PR review.
bjorn3 created PR review comment:
I feel like wasm-opt or such should get a mode which turns all exception throws into traps. That way other wasm engines without exception support would also be able to handle these modules.
fitzgen submitted PR review.
fitzgen created PR review comment:
Good clarification :+1:
fitzgen submitted PR review.
fitzgen created PR review comment:
Right, I was referring to the personality function here as DWARF's (really
.eh_frame
's) mechanism for determining where the respective handlers are located, since each FDE _can_ have its own personality function, and calling the function is the mechanism for figuring out if this frame is handling the exception or not. Of course, the personality functions are free to be implemented however they want, such as using a single personality routine for every FDE that interprets a custom format (as you mention) or using a bespoke personality function for every single landing pad with that landing pad's logic (that would otherwise have been in that custom format table) inlined into each FDE's personality function. But yeah, I think this is all a bit in the weeds and these details are not really relevant to the RFC itself.
fitzgen submitted PR review.
fitzgen created PR review comment:
I agree that would be nice, but of course I don't maintain
wasm-opt
:)We can always remove this knob if people don't use it or don't find it useful. But it will be nice as an incremental milestone either way. And it would always be useful if
wasm-opt
didn't support some proposed Wasm feature that Wasmtime did, and therefore you couldn't usewasm-opt
to do this as a wasm-to-wasm transform because it would balk at the unimplemented feature usage.
fitzgen updated PR #36.
As we seem to have reached pretty broad consensus, and feedback has moved more into the realm of typo fixes and minor clarifications, I'd like to propose that we merge this RFC!
Motion to Finalize
Disposition: Merge
As always, details on the RFC process can be found here: https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close
alexcrichton commented on PR #36:
I'd second the motion to merge :+1:
cfallin submitted PR review:
:+1:
At the risk of going to pun-jail, I have to note that I find this RFC... exceptional. Seconding!
As there has been signoff from representatives of two different BA stakeholder organizations, this RFC is now entering its 10-day
Final Comment Period
and the last day to raise concerns before this RFC merges is 2025-03-21.
Thanks everyone!
At the risk of slightly re-opening Pandora's box, I have a... final comment... during the Final Comment Period.
I'm ~1.5 days into implementing the Cranelift side of exception handling right now, and I have been realizing in a fairly deep way how the block-parameter strategy we have in this RFC may be suboptimal. In brief, the problem is that breaking the invariant that the block-call arguments in the target match up with the parameters on that block is leading to a lot of refactoring and gross hacks throughout the compiler. I don't doubt that I could continue to bulldoze through, but I'm worried that this is a strong signal about design coherence that will lead to future pain and bugs as we work with the IR, so I took a step back to think a bit (and talk with @fitzgen offline earlier today).
We originally decided above that we would write
try_call fn0(args...), block1(args1), [ tag1: block2(args2), tag2: block3(args3), ... ]
to mean that after return, control reaches one ofblock1
(no exception thrown),block2
orblock3
(or unwinds further up the stack) with the following block parameters: (i) returns offn0
concatenated withargs1
forblock1
; (ii) a single pointer-width exception payload argument andargs2
forblock2
, and (iii) likewise forblock3
. In other words, theBlockCall
s haveN
,M
andP
args, but the blocks themselves haveN + len(fn0.rets)
,M + 1
, andP + 1
params. This doesn't happen anywhere else in the IR, and leads to a lot of awkwardness in the parser, the verifier, the phi-removal pass, lowering, and regalloc's view of blockparams (so far). We anticipated that this aspect might be a little weird but it seemed tenable at a high level.We took this approach because we wanted to achieve a few goals: successor blocks should not be a special category or specially restricted; the IR should encode the invariant that function returns are only valid on normal return, and an exception payload is only valid on catching an exception; and we shouldn't do anything weird with SSA, like define return values and immediately use them in the
try_call
.I believe I have a better solution now though that still preserves these properties, and avoids the higher-than-anticipated impedance mismatches. Namely:
- We don't actually need an exception payload parameter in the IR, we think. In Wasmtime, we always have
vmctx
, and we can convey exceptional state through a field there (or perhaps in theStore
since unwinding can cross instances).- Return values can be defined by the
try_call
itself. They must already be defined at this point from a regalloc point of view, as a sort of "early def", because the register state will be filled in before control returns to the caller and transfers across the edge to the normal-return target.The latter point means that the normal-return values are defined even on exceptional returns. This may seem like a semantic lie. I have what I think is a reasonable answer that technically avoids adding UB to the IR: we could provide in the unwind metadata information about where return values go (registers/stack locations), and specify that it is up to the embedder to either fill those values in, or guarantee that it does not use the values on exceptional paths. (In effect, the values are always "returned", it's just that the unwinder decides the return values on unwinding; and Wasmtime is free to decide that the current value in
rax
at time of throw is the return value, because it generated IR that doesn't care, or it could zero them as a defense-in-depth thing.)Anyway, I'd be curious what others think, and I'm happy to fill in more about the pain-points discovered.
(Slightly tangential but worth saying, IMHO: I personally think we should always at least begin implementation or prototype before fully merging an RFC, and this is why -- there are some things we just can't know fully until we work through all the details!)
We don't actually need an exception payload parameter in the IR, we think
cg_clif absolutely needs it.
Hmm, that is good to know. And there isn't a way to tunnel the state through, e.g., a TLS slot? I suppose the issue here is that you don't control the stdlib so we basically need the same capability as LLVM for the native-code case...
I think one way to express what I'm finding in implementation is that there has to be something weird and different about exceptional edges, because they do introduce this payload. That payload has to come out of thin air somehow. The ways we have are:
- No it doesn't -- no payload here! (the new suggestion)
- Extra magical block args that are not in the block call, making the block calls special and different and requiring contortions when viewing those edges
- Extra magical operators at the start of the target blocks (i.e., landingpad instructions), making the blocks special and different and requiring contortions when editing them (and making sharing them harder)
I suppose the last option -- one I don't like, but just to name it -- is to say that both the normal returns and exceptional returns arise as normal defs, so the instruction results for a
try_call
are the concatenation of calls and exception payload. (If you squint, this is Go'sresult, err := f(...)
...) But this means that we have extra magical logic when squaring function signatures with instruction results (which will come up in, e.g., ABI code).So we either punt on the problem, or we have awkward logic on edges, or awkward logic on blocks, or awkward logic on calls. The native-code case apparently means we can't punt on the problem.
I'm not sure what to make of this -- we can discuss more in the Cranelift meeting tomorrow I suppose.
And there isn't a way to tunnel the state through, e.g., a TLS slot? I suppose the issue here is that you don't control the stdlib so we basically need the same capability as LLVM for the native-code case...
Yeah, rust_eh_personality only passes it as landingpad argument.
Did you see how I did the landingpad arguments in my branch by the way. While it adds some complexity, it didn't seem like it added all that much complexity.
We discussed this in the Cranelift meeting this morning and came to a relatively reasonable design point, which I'm actually fairly happy about: we decided to reify the "arising in the middle of an edge" values, the return values and exception payloads, as a new kind of "placeholder argument" in the block calls. There are some sketches of this above but basically it looks like:
try_call fn0(...), block1(ret0, ret1, v1, v2), [ tag1: block2(exn0, v3), ... ]
. In other words, we allowretN
andexnN
values in block calls on atry_call[_indirect]
instruction.Importantly, the args themselves are a sum-type of "normal value" or "placeholder" (try-call return, try-call exception payload, other in the future?). We don't define a new kind of value; these args are not values.
This is nice for a few reasons:
- It avoids the problem of mismatching block-call argument list and block parameter list, which started this whole discussion and would be a recurring footgun/wart in the design otherwise.
- It requires only local validation; we can know if the block calls are correct (referring to valid return values and exception payloads) by looking only at the function signature of the called function.
- It feels closest to the reality of where the values are defined: not by the
try_call
for all dominated instructions in both paths; not in successors (with landingpads or equivalent); but actually on the edge. Code that processes block-call arguments will need to handle this case, but it is front-and-center (and exposable as a new type) rather than an implicit thing.- It uses syntactic restrictions rather than cross-instruction or cross-block invariants to ensure well-formedness of the construct.
I'll continue building this and will say more if there are other unanticipated issues, but I don't expect there to be at the moment!
Last updated: Apr 14 2025 at 09:04 UTC