Stream: git-wasmtime

Topic: wasmtime / issue #10488 Cranelift: remove multiple return...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2025 at 22:08):

cfallin opened issue #10488:

I am currently implementing exception handling for Cranelift (WIP branch) and I believe I have hit a complexity wall that merits further discussion about Cranelift's IR design.

Currently, we permit signatures to have arbitrary numbers of return values, and we lower these to use stack slots. This adds a nontrivial bit of complexity to ABI handling, but we manage. It means that after a true call instruction, we may have loads that are nominally part of the callsite but are separate VCode instructions.

However, adding try_call instructions, which define return values as block-call args for their successors, has led me into a difficult spot when these extra "return-value instructions" exist -- I have tried all of the below approaches:

In summary: I've explored I believe every branch of the possibility tree, and the constraints are infeasible. We're doing too much in too few passes and/or supporting too-general input. We need to relax something:

Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2025 at 22:10):

cfallin commented on issue #10488:

We could go back to atomic-VCode-pseudoinst with loads as part of the try_call, "any" constraints, and allocate a temp reg as another def, and just live with the poor codegen.

I'll note that this just occurred to me when writing up the issue and is a late addition; it may be a way out. I still wanted to document the design-space exploration so far and see if anyone has any other thoughts here.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 30 2025 at 06:07):

cfallin commented on issue #10488:

I have WIP implementation of the temp-reg approach now in this branch -- this removes retval insts completely.

However, the next "final boss" that appears is that by having a potentially unbounded number of loads of retvals inside the one pseudo-inst, we exceed the max emission size so we'll need a mechanism similar to what we do for branch-table data to handle islanding properly.

In the process of writing a test and having to set options to get spillslots to come out in retvals, I discovered a reference to #9510 -- we actually discussed removing the "implicit loads from a return-area" behavior there too. I think this is more evidence that this aspect of our ABI handling is far too complex and has painted us into a corner. I suspect that we need to tackle #9510 (and thus resolve this issue by removing multiple-returns beyond a handful of registers) as a hard prerequisite to finishing exception-handling support.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 30 2025 at 09:29):

bjorn3 commented on issue #10488:

Have you seen https://github.com/bytecodealliance/wasmtime/issues/9510? Cranelift will need to remain support for multiple return values for ABI compat, but I don't see a reason why it needs to support more return values than fit in return registers. When you have more return values it will have to return them through a return area pointer, which you can implement in a fully abi-compatible way from the frontend. And in fact cg_clif already does this in most cases.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2025 at 18:43):

alexcrichton commented on issue #10488:

I'm not familiar enough with all the complications of reval handling enough in our backends to comment much on possible avenues here, but at a high level I would ideally like to preserve the ability to return multiple values all in registers at the machine code level. AFAIK that basically requires modeling multi-return to some degree in CLIF. If this means though that the number of return values is limited by the ABI that seems like a reasonable limitation to me, though.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2025 at 19:02):

cfallin commented on issue #10488:

Have you seen #9510

Indeed, rediscovered this while writing my most recent comment -- seems there are a number of reasons to dislike the complexity of the current situation!

preserve the ability to return multiple values all in registers

I agree; I think a limit somewhere between 2 (x86-64 SysV, rax + rdx) and 8 (aarch64, x0 to x7) makes sense. That's basically what the option that @bjorn3 added in #9511 is limiting us to anyway, in a default Cranelift config. (wasmtime-cranelift enables enable_multi_ret_implicit_sret so it does still rely on the arbitrary-returns-in-memory functionality, which is the part we'd have to migrate to the CLIF generation as discussed in #9510.) Open question whether we should pick a number that all ABIs can support (2?) or allow this limit to be platform-specific -- the latter would mean that we have one more dimension of platform-specificity in CLIF beyond pointer size today, but maybe that's fine.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2025 at 19:07):

bjorn3 commented on issue #10488:

The limit inside Cranelift will have to be platform and calling convention specific. On x86_64 cg_clif sometimes needs 3 registers for the Rust ABI, while at most two registers are allowed on windows.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2025 at 20:43):

cfallin commented on issue #10488:

I tried pushing this forward just a little bit more (commit) by emitting islands inline prior to retval insts when needed on aarch64, riscv64, pulley. I'm still hitting a number of regalloc panics and other issues that confirm for me that resolving this issue first is, indeed, a hard requirement for exception support.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 05:30):

cfallin commented on issue #10488:

To add just a bit more complexity here: when lifting the return-area-pointer functionality into Wasmtime we can almost get away with inventing our own details here, except that the return-area-pointer support in the ABI code is also load-bearing for the Cranelift Winch ABI implementation, which is used in our trampoline generation for Winch code. So we'll need to either implement two return-area-pointer ABIs by hand in the CLIF generation, or (perhaps) standardize on the Winch one. Unfortunately Winch's ABI is a little suboptimal register-usage-wise -- it passes only one retval (the last) in a register, and the rest on the stack.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 08:11):

cfallin commented on issue #10488:

Final update: after realizing how deeply enmeshed multi-return usage is, I pushed on this further, and I've got a branch here that now passes all tests -- I bulldozed through the issues mentioned above (emitting islands; regalloc panics turned out to be bytecodealliance/regalloc2#196 plus trying to allocate a temp with all regs already used as rets or clobbered in tail callconv; a few other minor issues). I'll clean this up and submit a PR tomorrow so as to unblock exception handling and then the broader refactor can happen off the critical path.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 08:12):

cfallin edited a comment on issue #10488:

Final update: after realizing how deeply enmeshed multi-return usage is, I pushed on this further, and I've got a branch here that now passes all tests with return-value instructions completely removed/integrated into the call inst -- I bulldozed through the issues mentioned above (emitting islands; regalloc panics turned out to be bytecodealliance/regalloc2#196 plus trying to allocate a temp with all regs already used as rets or clobbered in tail callconv; a few other minor issues). I'll clean this up and submit a PR tomorrow so as to unblock exception handling and then the broader refactor can happen off the critical path.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 05 2025 at 18:29):

cfallin closed issue #10488:

I am currently implementing exception handling for Cranelift (WIP branch) and I believe I have hit a complexity wall that merits further discussion about Cranelift's IR design.

Currently, we permit signatures to have arbitrary numbers of return values, and we lower these to use stack slots. This adds a nontrivial bit of complexity to ABI handling, but we manage. It means that after a true call instruction, we may have loads that are nominally part of the callsite but are separate VCode instructions.

However, adding try_call instructions, which define return values as block-call args for their successors, has led me into a difficult spot when these extra "return-value instructions" exist -- I have tried all of the below approaches:

In summary: I've explored I believe every branch of the possibility tree, and the constraints are infeasible. We're doing too much in too few passes and/or supporting too-general input. We need to relax something:

Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 05 2025 at 18:29):

cfallin commented on issue #10488:

Since #10502 removed return-value instructions, I believe this issue is, well, no longer an issue. We discussed whether we should do something about arbitrary return-value signatures and the integrated handling of stack returns in the ABI code during the most recent Cranelift meeting, and concluded that maybe it's fine for now. So I'm going to go ahead and close this. Thanks all for the discussion!


Last updated: Apr 17 2025 at 21:03 UTC