Stream: git-wasmtime

Topic: wasmtime / PR #2678 x86-64 Windows fastcall ABI support.


view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 04:31):

cfallin requested abrown and bnjbvr for a review on PR #2678.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 04:31):

cfallin requested abrown and bnjbvr for a review on PR #2678.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 04:31):

cfallin opened PR #2678 from x64-fastcall to main:

This adds support for the "fastcall" ABI, which is the native C/C++ ABI
on Windows platforms on x86-64. It is similar to but not exactly like
System V; primarily, its argument register assignments are different,
and it requires stack shadow space.

Note that this also adjusts the handling of multi-register values in the
shared ABI implementation, and with this change, adjusts handling of
i128s on both Fastcall/x64 and SysV/x64 platforms. This was done
to align with actual behavior by the "rustc ABI" on both platforms, as
mapped out experimentally (Compiler Explorer link in comments).

Note also that this does not add x64 unwind info on Windows. That will
come in a future PR (but is planned!).

Finally, note that this does not add any tests that actually execute
with the fastcall ABI on Windows, because we are not testing the new
backend on Windows on our CI setup; test coverage is via filetests for
now. We could possibly Windows-with-new-backend CI if we wanted to,
though I don't think the Wasmtime-internal use cases will make use of
Fastcall.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 04:33):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 04:34):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 07:04):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 07:04):

bjorn3 created PR Review Comment:

This feature doesn't exist.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 07:58):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 07:59):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 07:59):

cfallin created PR Review Comment:

Hmm, the changes in parser.rs were unintended, sorry about that -- part of my experiments in removing the old backend altogether. (I'll put together a draft PR for that soon-ish.) Reverted this file.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 18:56):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 19:14):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

Weird... this _is_ used by SseOpcode, right?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

documentation: not immediately obvious why this is needed.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

Can we add an issue URL here?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

Not sure I understand why these should change... The default ABI should still be System V and that hasn't really changed, right?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

It seems like we should be throwing/panicking before an ABIArgPart::None is ever allowed.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

It sort of feels like the Type should be attached to ABIArg, not ABIArgPart--how do we know what ABIArg's original type is?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

I wouldn't mind including links like this one in some of the docs.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

Perhaps we should debug_assert or something here that we are still aligned?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

I think we need to reconcile this with the example from the Microsoft docs; their __m128 is being returned in XMM0.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

BTW, what is happening with multi-return? Is some code somewhere else panicking to avoid this case or should we be panicking here?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

We don't have a "store on the stack" Inst?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

I've seen this type of thing enough that we may want to create an inline function align somewhere.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

More detail? panic!("unable to save fastcall callee register for the register class: {}", r.get_class())

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:25):

abrown created PR Review Comment:

I guess the same applies to purpose: what happens if self.parts()[1] has a different purpose than self.parts()[0]?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:33):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:33):

cfallin created PR Review Comment:

This is used by ValueRegs's packed encoding -- it doesn't have a length field, instead using "invalid" sentinels for parts beyond the needed part-count. Perhaps my reuse of ValueRegs rather than something simpler like Vec<ABIArgPart> is the real issue -- in hindsight it's probably better to just do that!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:34):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:34):

cfallin created PR Review Comment:

The latter case violates an intended invariant (if that's not asserted I'll add it -- will check). The Type is actually specific to the ABIArgPart -- we use it to generate loads/stores just for that part.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:36):

cfallin created PR Review Comment:

Our prior i128 impl for SysV ABI was actually wrong w.r.t. Rust's ABI; this PR aligns them for SysV as well. (I'm actually surprised we didn't catch this earlier, but I suppose all of the cg_clif tests so far haven't had an i128 arg that straddles the 6th and 7th machine words, i.e. the half-reg-and-half-stack case, which is what causes the difference.)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:36):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:39):

cfallin created PR Review Comment:

So this is actually the most interesting unresolved question here, I think. It appears that rustc behaves for ABI purposes as if i128 is just two i64s; this affects returns (rax/rdx) and also the way that args are assigned (lower half can be in reg and upper half on stack).

This raises the question whether cg_clif needs to change or whether we need to document that our i128s are basically lowered to i64 pairs for ABI purposes and are not treated as true 128-bit values. The former is possible (and then we can just punt on i128s in fastcall) but requires changes on @bjorn3's end -- thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2021 at 22:39):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2021 at 07:07):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2021 at 07:07):

bjorn3 created PR Review Comment:

Rustc forces all scalars > 8 bytes to be passed by reference: https://github.com/rust-lang/rust/blob/9fa580b1175018b0a276b0bc68f9827a106f7260/compiler/rustc_target/src/abi/call/x86_win64.rs#L23 This means that i128 is never used as argument type by rustc for WindowsFastcall.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2021 at 03:10):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2021 at 03:10):

cfallin created PR Review Comment:

Interesting -- that seems to be inconsistent with what the actual compiler output shows (or at least, I may be misunderstanding in which cases the "scalars > 8 bytes" by-ref rule applies): in this example, the i128 args are being passed as two i64s, by value. Does cg_clif perform this decomposition as well (i.e. use i64s in the CLIF)?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 00:51):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 00:51):

cfallin created PR Review Comment:

@bjorn3 reading a bit more on the cg_clif side: I see at least here a libcall invocation that uses I128 types directly, and in the general ABI mapping I see some stuff related to "spread" args (used for tuples it seems?) but no explicit handling for i128s. (Does rustc's front/middle-end say that i128s are "spread" across i64s?)

A possible solution: we could invent an ABI variant, essentially, that adds "rustc-compatible i128 handling" on top of SysV or Fastcall. Perhaps CallConv::SystemVRust and CallConv::WindowsFastcallRust. Then we can keep the i128-becomes-two-i64s handling on the Cranelift side, and everything should basically work. When in vanilla SystemV or WindowsFastcall, I would then propose that we actually panic if we see an i128, unless/until we later implement this in a strictly spec-compliant way.

Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 07:16):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 07:16):

bjorn3 created PR Review Comment:

I see at least here a libcall invocation that uses I128 types directly

In the Windows support PR these arguments are passed by reference when comliling for Windows: https://github.com/bjorn3/rustc_codegen_cranelift/pull/1145/files#diff-0c7a9de394376b33448baed91218bb8167a6d7cac581590442cfb81b9f6e63f7R39

Does rustc's front/middle-end say that i128s are "spread" across i64s?

No, it just directly passes it to LLVM for SystemV and forces it to be passed Indirect on Windows.

A possible solution: we could invent an ABI variant, essentially, that adds "rustc-compatible i128 handling" on top of SysV or Fastcall.

The SystemV abi defines how __int128 needs to be passed.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 12:50):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 12:50):

bjorn3 created PR Review Comment:

Interesting -- that seems to be inconsistent with what the actual compiler output shows (or at least, I may be misunderstanding in which cases the "scalars > 8 bytes" by-ref rule applies): in this example, the i128 args are being passed as two i64s, by value. Does cg_clif perform this decomposition as well (i.e. use i64s in the CLIF)?

In that example the "Rust" abi is used instead of the "C" abi. The "Rust" abi has the extra rule that any value that fits into two registers is given the Cast(CastTarget::from(Reg { kind: RegKind::Integer, size: 128 })) pass mode. This is independent of the real type of the value. It even works for structs. https://github.com/rust-lang/rust/blob/d2731d8e9338d8fe844e19d3fbb39617753e65f4/compiler/rustc_middle/src/ty/layout.rs#L2894-L2897

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:03):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:03):

cfallin created PR Review Comment:

Hmm, indeed. I hadn't realized that the Rust ABI would differ here (my naive understanding was that it had mostly to do with struct layout details, eg allowing fields to be reordered, but I see that's not the only thing!).

So, my earlier thoughts stemmed from that Compiler Explorer example and my resulting assumption that rustc was doing something weird and not-quite-standards-compliant with i128s, but that we would have to handle this in Cranelift in a function that specifies standard ABI. But, indeed, if I add extern "C" to the signature, I see that rustc does the Fastcall-compliant thing with by-ref args.

I see now that (correct me if I am wrong here) it appears that cg_clif will do the appropriate translation to SysV/Fastcall-compliant details for a Rust ABI signature; so we don't need to special-case this, and instead just need to implement the standard and nothing more.

The part I was hoping to avoid ("panic if we see an i128") was having to implement the by-ref semantics as well for i128, as this adds extra complexity and another case. The only time that we would see this would be when interfacing with a C-ABI function that uses i128; either as an extern "C" in rustc + cg_clif, or with an __int128 in a hypothetical future C/C++ frontend to Cranelift (or some other language that has 128-bit types). But, at least the first case (extern "C" with i128 with with cg_clif) is possible to trigger, so we should handle it.

I'll (i) implement by-ref for i128s, (ii) rip out the Rust ABI-specific details on Fastcall, and then (iii) add a whole bunch of tests for these dark corners.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:06):

cfallin edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:07):

cfallin edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:18):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:18):

bjorn3 created PR Review Comment:

rip out the Rust ABI-specific details on Fastcall

If you are talking about passing i128 in two registers, that is still necessary for extern "Rust" as the abi of cg_clif and cg_llvm should match.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:23):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:23):

cfallin created PR Review Comment:

Hmm, now I am confused again. I had meant the code that implements "i128 at CLIF level becomes two i64s in fastcall signature", because that is not Fastcall-compliant. If a frontend (cg_clif or any other) does this split, though, then that would still work as any other i64 args would.

So to clarify: are you saying that cg_clif will pass i128 args for extern "Rust" functions? (My understanding is no...)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:25):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:25):

bjorn3 created PR Review Comment:

For extern "C" functions it will pass it by reference. For extern "Rust" functions it will pass i128 args.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:30):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:30):

cfallin created PR Review Comment:

For extern "Rust" functions it will pass i128 args.

OK, so then for Rust ABI compatibility, this requires functionality in Cranelift that goes beyond standard Fastcall, right? If we implement fastcall-compliant handling of i128s, then Cranelift will generate by-ref arg passing code, and this will not be compatible with other Rust code (which splits into two i64 registers), as shown in the Compiler Explorer example. This was the reason I had suggested Rust-specific variants of CallConv above. Does that make sense?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:42):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2021 at 17:42):

bjorn3 created PR Review Comment:

Makes sense, though I think it is more of an LLVM-specific variant, as it is an LLVM extension to support i128 passed in two registers. LLVM also for example supports passing three return values in registers, unlike SystemV which only allows two. This extension is supported by the old x86 backend already, but not in the new backend. cg_clif doesn't use more than two return registers, so that extension doesn't matter.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:45):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:45):

cfallin created PR Review Comment:

The BMI2 variant is never constructed and was triggering a warning, so I've moved the pragma there, though I can remove this drive-by warning fixup and we can do a separate warning cleanup (the new backend build isn't fully warning-clean yet) if you'd prefer!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:45):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:45):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:45):

cfallin created PR Review Comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:46):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:46):

cfallin created PR Review Comment:

Changed -- this has now been refactored so that we have a Vec of ABIArgSlots (old "parts") and don't overload ValueRegs.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:47):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:47):

cfallin created PR Review Comment:

Fixed this in refactor -- purpose is part of toplevel ABIArg::Slots variant which contains vec of slot descriptions, so there is no duplication.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin created PR Review Comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin created PR Review Comment:

Fixed via refactor; not using ValueRegs anymore so no None sentinel variant is needed.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:48):

cfallin created PR Review Comment:

Did one better; just used alignment helper.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:49):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:49):

cfallin created PR Review Comment:

Resolved in latest update -- this behavior is now under the enable_llvm_abi_extensions flag as discussed in the comments below.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:49):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:49):

cfallin created PR Review Comment:

Done! align_to() now used throughout this ABI code (and corresponding aarch64 code too).

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:50):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:50):

cfallin created PR Review Comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:51):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:51):

cfallin created PR Review Comment:

With more return values, we use the return-area pointer, similar to what happens under SysV. This is, properly speaking, an extension (for both ABIs); but it's necessary for e.g. Wasm multi-value support.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:53):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:54):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:54):

cfallin created PR Review Comment:

Switched to the usual store helper; not sure why the old code didn't use this.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 03:56):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 04:25):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 05:55):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 05:57):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 06:08):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 17:36):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 17:23):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 17:59):

cfallin updated PR #2678 from x64-fastcall to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 18:46):

cfallin merged PR #2678.


Last updated: Nov 22 2024 at 16:03 UTC