Stream: git-wasmtime

Topic: wasmtime / PR #10502 Cranelift: remove return-value instr...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:01):

cfallin opened PR #10502 from cfallin:new-retvals-same-great-flavor-now-with-less-instructions to bytecodealliance:main:

This PR addresses the issues described in #10488 in a more head-on way: it removes the use of separate "return-value instructions" that load return values from the stack, instead folding these loads into the semantics of the call VCode instruction.

This is a prerequisite for exception-handling: we need calls to be workable as terminators, meaning that we cannot require any other (VCode) instructions after the call to define the return values.

In principle, this PR starts simply enough: the return-locations list on the CallInfo that each backend uses to provide regalloc metadata is updated to support a notion of "register or stack address" as the source of each return value, and this list is now used for both kinds of returns, not just returns in registers. Shared code is defined in machinst::abi used by all backends to perform the requisite loads.

In order to make this work with more defined values than fit in registers, however, this PR also had to add support for "any"-constrained registers to Cranelift, and handling allocations that may be spillslots. This has always been supported by RA2, but this is the first time that Cranelift uses them directly (previously they were used only internally in RA2 as lowerings from other kinds of constraints like safepoints). This requires encoding a spillslot index in our Reg type.

There is a little bit of complexity around handling the loads/defs as well: if we have a return value on-stack, and we need to put it in a spillslot, we cannot do a memory-to-memory move directly, so we need a temporary register. Earlier versions of this PR allocated another temp as a vreg on the call, but this doesn't work with all calling conventions (too many clobbers). For simplicity I picked a particular register that is (i) clobbered by calls and (ii) not used for return values for each architecture (x86-64's tailcall needed to lose one return-in-register slot to make this work).

This removes retval insts from the shared ABI infra completely. s390x is different, still, because it handles callsite lowering from ISLE; we will need to address that separately for exception support there.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:01):

cfallin requested abrown for a review on PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:01):

cfallin requested wasmtime-compiler-reviewers for a review on PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:01):

cfallin requested alexcrichton for a review on PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:01):

cfallin requested wasmtime-core-reviewers for a review on PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:02):

cfallin commented on PR #10502:

I'll note that this appears to pessimize codegen somewhat for many-return-values (more than fit in registers, so more than 8? on aarch64/x86-64), since this does a manual load-from-stack/store-to-spillslot rather than letting regalloc handle it; that seems acceptable given the complexity reduction and unblocking exception support, IMHO.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:17):

alexcrichton requested fitzgen for a review on PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:17):

alexcrichton submitted PR review:

Overall this looks reasonable to me, and I have a passing thought, but I'm going to defer to @fitzgen for review rather than myself as I think he's been chatting with you more about this

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:17):

alexcrichton created PR review comment:

Could this case assert that vreg was not allocated to temp?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 18:17):

alexcrichton created PR review comment:

Also, is it possible to somewhat easily assert that temp is callee-saved? (or caller-save, I forget which would be appropriate here)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 19:44):

github-actions[bot] commented on PR #10502:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 22:36):

cfallin updated PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 22:38):

cfallin updated PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 22:39):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 22:39):

cfallin created PR review comment:

Done (to assert), good idea. It's a little trickier to assert anything about whether this is a caller-saved (volatile) register we can use: this helper is called after we've thrown away all the ABI info except for what's in the CallInfo, so we'd have to somehow preserve that.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 01 2025 at 22:41):

cfallin commented on PR #10502:

Just pushed a fix for aarch64 -- it turns out our optimization to avoid unnecessary clobber-saves for the lower half of float registers (is_included_in_clobbers) can no longer apply if we have other defs (the stack-carried values) that allocate into other registers; those other registers are indeed clobbered by our function body! The pessimization there is, again, only on very-many-returned-values cases; the happy path of returns-that-fit-in-regs is unaffected.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 02 2025 at 14:31):

alexcrichton commented on PR #10502:

this helper is called after we've thrown away all the ABI info

Could the temp parameter to CallInfo::emit_retval_loads turn into an associated const and/or function on ABIMachineSpec? That way it wouldn't have to be passed as a parameter and it'd also be available for assertions if necessary elsewhere? No need really to bend over backwards for this, but I do think it'd be good to have a debug assertion that it's in the right set of caller or callee

view this post on Zulip Wasmtime GitHub notifications bot (Apr 02 2025 at 16:44):

cfallin updated PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 02 2025 at 16:45):

cfallin updated PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 02 2025 at 16:45):

cfallin commented on PR #10502:

Ah, yes, that's a much better idea, thanks! Done.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2025 at 13:02):

uweigand commented on PR #10502:

Hi @cfallin I've had a quick look at this, and it should be relatively straightforward to implement this for s390x as well; I'd be happy to do that (either after this has landed, or else I could provide a patch ahead of time?).

However, I do have one comment about the new has_non_abi_defs mechanism. Not only is this further complicating the is_included_in_clobbers mechanism, it also introduces performance regressions.

The original reason for is_included_in_clobbers was that for partially callee-saved registers (e.g. on s390x, only the top half of some vector registers, which overlap the old floating-point registers, is callee-saved), in a sequence of A calling B calling C, we have to conservatively assume that the call B->C clobbers them, while the call A->B must preserve them, and therefore B will have to save/restore them all. Of course, as long as the A->B calling convention is the same as the B->C calling convention, we can avoid this, which is implemented via is_included_in_clobbers.

Now, assuming the call B->C needs another def as it has to load a return value into a callee-saved register. We have to consider this, which is what the new has_non_abi_defs mechanism achieves. However, that will then also have the side effect of once again forcing B to save/restore all floating-point registers - there's really no good reason for this.

Quite a while ago, I made the suggestion that is_included_in_clobbers should really only apply to the clobber list, not the def list. If we were to make that change, then the extra defs for this case would just happen to work transparently without any new has_non_abi_defs mechanism, and it would not introduce the above regression.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2025 at 23:23):

cfallin commented on PR #10502:

@uweigand thanks, that is a great suggestion -- much better than what I did, and I will update as suggested!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2025 at 23:43):

cfallin updated PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 05 2025 at 05:00):

fitzgen submitted PR review:

LGTM!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 05 2025 at 05:22):

fitzgen merged PR #10502.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 00:04):

alexcrichton commented on PR #10502:

A number of fuzz bugs have this in the regression range and I suspect might be related, would you be up for taking a look at these when you're back on Monday?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 00:15):

cfallin commented on PR #10502:

Yes, just bisected and commented on one of them -- will take a look!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 03:08):

cfallin commented on PR #10502:

Fuzzbug fix in RA2 (https://github.com/bytecodealliance/regalloc2/pull/214) -- this PR creates more challenging constraints and the bundle-splitting logic had to get a little bit more refined.


Last updated: Apr 17 2025 at 10:03 UTC