Stream: git-wasmtime

Topic: wasmtime / Issue #2228 AArch64: Add a test case for vecto...


view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2020 at 21:20):

bjorn3 commented on Issue #2228:

The default call call conv for aarch64 is systemv, not aapcs. There are some differences between the two in the float handling on at least arm32. For example https://github.com/rust-lang/rust/blob/85fbf49ce0e2274d0acf798f6e703747674feec3/compiler/rustc_target/src/abi/call/arm.rs#L91. I don't know if this is also the case on aarch64 and if so if this is one example.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2020 at 21:26):

github-actions[bot] commented on Issue #2228:

Subscribe to Label Action

cc @bnjbvr

<details>
This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2020 at 21:44):

akirilov-arm commented on Issue #2228:

AArch32 is a completely different case, in which there are many ABI variants indeed.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2020 at 21:48):

cfallin commented on Issue #2228:

Thanks @akirilov-arm -- it seems you're right that we could save half of our stack space and memory traffic for FP/vec clobber-saves. This should be a straightforward change in the ABI code; I want to verify first that this won't break SpiderMonkey (I think not, as every register is caller-save, IIRC) then I'll create a PR.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2020 at 22:10):

akirilov-arm commented on Issue #2228:

@cfallin I might have misunderstood you, but let me rephrase: There are 2 issues - one on the caller and one on the callee side. You are talking about the latter, but I am more concerned about the former. Consider the test case in the PR - the compiler is stashing the vectors (and they are full, 128-bit vectors) before the call to %g1 in v8, v9, and v10, respectively, which is not going to work in the general case, assuming AAPCS64 compliance.

For comparison, here is what GCC and LLVM are doing in an equivalent situation.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 26 2020 at 00:03):

cfallin commented on Issue #2228:

Ah, right, this is a bigger issue with respect to callsites; I had missed that half of it, sorry. So because we don't reason about half-clobbers currently (or overlapping registers), we need to treat all vector registers as clobbers on the caller side. I'll create a patch for this first thing Monday.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 29 2020 at 00:40):

cfallin commented on Issue #2228:

So on further consideration (and after doing the one-line patch suggested and studying its effect), the solution is actually more complex than I had suggested above:

The problem is how these interact: any function call at all now forces a bunch of clobber saves/restores in the caller's prologue and epilogue, because the half-clobbers from the call become full-clobbers (due to the imprecision of not reasoning about half-registers).

In essence, we need a way to "pass through" the half-clobbers from callsite to prologue/epilogue, as long as callee and caller have the same ABI; only true clobbers from the rest of the function body should alter the prologue/epilogue.

I can think of a few hacks, e.g. add the full-clobbers at callsites (as defs) to avoid regalloc using the registers, but compute our own clobber-set by scanning over every instruction except callsites; but this feels very error-prone and fragile compared to simply doing the right thing and reasoning through overlapping registers.

@julian-seward1, thoughts on this re: regalloc and how difficult it would be to support half-defs / half-clobbers?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 29 2020 at 23:52):

akirilov-arm commented on Issue #2228:

@cfallin I realized that I should expand my changes and add test cases with f32 and f64 values, so do not merge anything, please.

In essence, we need a way to "pass through" the half-clobbers from callsite to prologue/epilogue, as long as callee and caller have the same ABI...

Do you mean that the caller lowering should pass somehow the clobber information to the callee lowering? I thought that lowering passes for different functions were more or less independent, and indeed happened in different worker threads...

Unfortunately I can't say anything the rest of your comments because I am missing some information - in particular, your quick fix and its effects (expanding the test cases should help with the latter).

view this post on Zulip Wasmtime GitHub notifications bot (Sep 29 2020 at 23:58):

cfallin commented on Issue #2228:

Do you mean that the caller lowering should pass somehow the clobber information to the callee lowering? I thought that lowering passes for different functions were more or less independent, and indeed happened in different worker threads...

No, what I mean is more conceptual: if the callee clobbers some set of registers C, then by calling it, the caller clobbers at least C as well (that's what I meant by "pass through"; it's something that intrinsically happens, not something we are doing; sorry, it was unclear). So if the caller and callee have the same ABI, then I think we can effectively hack around this limitation by:

So this will require a small change to regalloc.rs, namely a single bit per instruction to indicate "exclude mods/defs from clobbers". @julian-seward1 and @bnjbvr, thoughts on this?


Last updated: Dec 23 2024 at 13:07 UTC