I've been trying to implement exception support over at https://github.com/bjorn3/wasmtime/tree/eh_cleanup I'm not sure how to handle callee saved registers though. They are trashed in case of the cleanup path AFAIK, but for good performance shouldn't need to be restored in case of a regular return. Caller saved registers are also somewhat annoying, but should be doable. The basic design I'm trying to implement is:
function %foo(i32) system_v {
; eh_personality rust_eh_personality
; list of alternative targets for the invoke outside of the control of Cranelift.
jt0 = jump_table [block2]
sig0 = () system_v
sig1 = (i64) system_v
fn0 = %bar sig0
fn1 = %_Unwind_Resume sig1
block0(v0: i32):
invoke fn0, block1, jt0
block1:
return
; All registers specified as callee-saved by the base ABI are restored, as well as scratch registers
; %rdi,%rsi,%rdx,%rcx(see below). Except for those exceptions, scratch (or caller-saved) registers
; are not preserved, and their contents are undefined on transfer.
; up to four args passed in %rdi, %rsi, %rdx, %rcx
; panic_unwind passes the exception_object pointer as the first arg and no additional arguments
block2(v1: i64) eh_landing_pad: ; eh_landing_pad is the "block entry abi"
; do cleanup work. all values accessible at the invoke are also accessible here
tail_call fn1(v1)
}
This should work fine for DWARF unwinding, while also being flexible enough to enable non-exception usecases like maybe on stack replacement in the future.
cc https://github.com/bytecodealliance/wasmtime/issues/1677, https://github.com/bytecodealliance/wasmtime/issues/2049, https://github.com/bytecodealliance/wasmtime/issues/3427
https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html
@Chris Fallin This was my question during the cranelift bi-weekly.
Ah, I think this should "just work" in the way you're hoping for, if I understand correctly. Basically the regalloc implicitly reloads caller-saved values when needed, but this is a consequence of spill/reload mechanisms and the regalloc-metadata model of the call instruction (clobbers all caller-saves) rather than some explicitly-coded behavior
so if values in caller-saves are not needed on the landingpad path then they simply won't be reloaded
the "some additional registers are also preserved" part is interesting, and right now we don't have a way of representing "conditional clobbers" like this; in principle it might be possible to model but it's a lot more complex, so I'd prefer not to if it's not clearly needed for adequate performance
Forgot about that comment that callee-saved registers are restored. I wrote that file months ago.
Chris Fallin said:
the "some additional registers are also preserved" part is interesting, and right now we don't have a way of representing "conditional clobbers" like this; in principle it might be possible to model but it's a lot more complex, so I'd prefer not to if it's not clearly needed for adequate performance
If you are refering to %rdi, %rsi, %rdx and %rcx, they are set by the personality function as kind of extra arguments. I modeled them as block parameters.
In any case I think it would work once implementing the right exception tables to restore callee saved registers, thanks!
ah ok cool -- so if you need access to those then the right way to do it is I think to put a pseudoinstruction at the top of the unwind path that defs parameters with fixed-reg constraints (e.g. def v123 fixed in %rdi); then it will pick up the values automatically
That was basically what I wanted to do.
@Chris Fallin Is there a way to introduce an extra block during lowering to machinst's? Basically I need to turn invoke
into a sequence of load inputs, call function, jump to temp block, and then in the temp block store the return values and jump to whichever the destination of the invoke is. I believe the call needs to be a terminator at machinst level and not just clif ir level as it can return to more than one place.
@bjorn3 not currently, no; the invariant is that lowering does not introduce additional control flow that is visible to the register allocator
though various sequences do emit local branches within a single pseudoinstruction; that's fine as long as it's single-in, single-out (e.g. trap-if)
it's somewhat surprising to me that we would need additional control-flow expansion during lowering (i.e. that the invoke
with landingpad edge is not enough to capture the control flow) but I'm happy think more about this if that's really the case
The problem is that invoke
needs to lower to argument stores after the call, but the landingpad edge is right after the call instruction.
OK, perhaps something could be done in the landingpad itself then? I'm not entirely sure, and I don't have cycles to think deeply about this right now, but there should be a way to work around the invariant; we really really do not want to introduce the complexity of patching in new blocks here if we can help it
Basically I need:
abi.emit_stack_pre_adjust(ctx);
assert_eq!(inputs.len(), abi.num_args());
for i in abi.get_copy_to_arg_order() {
let input = inputs[i];
let arg_regs = put_input_in_regs(ctx, input);
abi.emit_copy_regs_to_arg(ctx, i, arg_regs);
}
abi.emit_invoke(ctx, temp_block, alternatives);
// Switch to temp_block
for (i, output) in outputs.iter().enumerate() {
let retval_regs = get_output_reg(ctx, *output);
abi.emit_copy_retval_to_regs(ctx, i, retval_regs);
}
abi.emit_stack_post_adjust(ctx);
OK, perhaps something could be done in the landingpad itself then?
No, those stores are for when the landingpad is not hit.
OK, we'll need to find another way then; not possible to introduce a block in that context
By the way how do I tell everything what successors invoke
has?
Which function needs an extra branch?
Found it: analyze_branch
.
@Chris Fallin Would it be possible to require that the successor block for a normal return of the invoke instruction has a single predecessor and then during lowering of the invoke instruction add the return value store instructions to the successor block? Or is it not possible to partially fill a block from the start either?
@bjorn3 no, I don't think so, it breaks a number of assumptions for an instruction in one block to put lowered instructions in another.
Can we put a pseudoinstruction in the successor that does these stores? Both at the CLIF level, and then when lowered as well; at the MachInst level the call (or invoke specifically) writes whatever RealRegs, and the instruction at the head of the successor reads them, and regalloc will do the right thing reserving those liveranges
Those stores must not be performed for the case where it unwinds and the respective output values likely shouldn't be considered defined either.
OK, but a "conditional definition" is not possible with current regalloc, and in general that's really difficult and error-prone; my suggestion above will work properly, I think
if the stores occur as part of the successor inst, then this makes at least those conditional
I guess I will try to put the store instructions in the invoke machinst pseudoinstruction after the call and before the jump.
Last updated: Dec 23 2024 at 12:05 UTC