cfallin requested abrown and bnjbvr for a review on PR #2678.
cfallin requested abrown and bnjbvr for a review on PR #2678.
cfallin opened PR #2678 from x64-fastcall
to main
:
This adds support for the "fastcall" ABI, which is the native C/C++ ABI
on Windows platforms on x86-64. It is similar to but not exactly like
System V; primarily, its argument register assignments are different,
and it requires stack shadow space.Note that this also adjusts the handling of multi-register values in the
shared ABI implementation, and with this change, adjusts handling of
i128
s on both Fastcall/x64 and SysV/x64 platforms. This was done
to align with actual behavior by the "rustc ABI" on both platforms, as
mapped out experimentally (Compiler Explorer link in comments).Note also that this does not add x64 unwind info on Windows. That will
come in a future PR (but is planned!).Finally, note that this does not add any tests that actually execute
with the fastcall ABI on Windows, because we are not testing the new
backend on Windows on our CI setup; test coverage is via filetests for
now. We could possibly Windows-with-new-backend CI if we wanted to,
though I don't think the Wasmtime-internal use cases will make use of
Fastcall.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This feature doesn't exist.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Hmm, the changes in
parser.rs
were unintended, sorry about that -- part of my experiments in removing the old backend altogether. (I'll put together a draft PR for that soon-ish.) Reverted this file.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
abrown submitted PR Review.
abrown submitted PR Review.
abrown created PR Review Comment:
Weird... this _is_ used by
SseOpcode
, right?
abrown created PR Review Comment:
documentation: not immediately obvious why this is needed.
abrown created PR Review Comment:
Can we add an issue URL here?
abrown created PR Review Comment:
Not sure I understand why these should change... The default ABI should still be System V and that hasn't really changed, right?
abrown created PR Review Comment:
It seems like we should be throwing/panicking before an
ABIArgPart::None
is ever allowed.
abrown created PR Review Comment:
It sort of feels like the
Type
should be attached toABIArg
, notABIArgPart
--how do we know whatABIArg
's original type is?
abrown created PR Review Comment:
I wouldn't mind including links like this one in some of the docs.
abrown created PR Review Comment:
Perhaps we should
debug_assert
or something here that we are still aligned?
abrown created PR Review Comment:
I think we need to reconcile this with the example from the Microsoft docs; their
__m128
is being returned in XMM0.
abrown created PR Review Comment:
BTW, what is happening with multi-return? Is some code somewhere else panicking to avoid this case or should we be panicking here?
abrown created PR Review Comment:
We don't have a "store on the stack"
Inst
?
abrown created PR Review Comment:
I've seen this type of thing enough that we may want to create an inline function
align
somewhere.
abrown created PR Review Comment:
More detail?
panic!("unable to save fastcall callee register for the register class: {}", r.get_class())
abrown created PR Review Comment:
I guess the same applies to
purpose
: what happens ifself.parts()[1]
has a different purpose thanself.parts()[0]
?
cfallin submitted PR Review.
cfallin created PR Review Comment:
This is used by
ValueRegs
's packed encoding -- it doesn't have a length field, instead using "invalid" sentinels for parts beyond the needed part-count. Perhaps my reuse ofValueRegs
rather than something simpler likeVec<ABIArgPart>
is the real issue -- in hindsight it's probably better to just do that!
cfallin submitted PR Review.
cfallin created PR Review Comment:
The latter case violates an intended invariant (if that's not asserted I'll add it -- will check). The
Type
is actually specific to theABIArgPart
-- we use it to generate loads/stores just for that part.
cfallin created PR Review Comment:
Our prior
i128
impl for SysV ABI was actually wrong w.r.t. Rust's ABI; this PR aligns them for SysV as well. (I'm actually surprised we didn't catch this earlier, but I suppose all of the cg_clif tests so far haven't had ani128
arg that straddles the 6th and 7th machine words, i.e. the half-reg-and-half-stack case, which is what causes the difference.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
So this is actually the most interesting unresolved question here, I think. It appears that
rustc
behaves for ABI purposes as ifi128
is just twoi64
s; this affects returns (rax/rdx) and also the way that args are assigned (lower half can be in reg and upper half on stack).This raises the question whether cg_clif needs to change or whether we need to document that our
i128
s are basically lowered toi64
pairs for ABI purposes and are not treated as true 128-bit values. The former is possible (and then we can just punt oni128
s in fastcall) but requires changes on @bjorn3's end -- thoughts?
cfallin submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Rustc forces all scalars > 8 bytes to be passed by reference: https://github.com/rust-lang/rust/blob/9fa580b1175018b0a276b0bc68f9827a106f7260/compiler/rustc_target/src/abi/call/x86_win64.rs#L23 This means that i128 is never used as argument type by rustc for WindowsFastcall.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Interesting -- that seems to be inconsistent with what the actual compiler output shows (or at least, I may be misunderstanding in which cases the "scalars > 8 bytes" by-ref rule applies): in this example, the
i128
args are being passed as twoi64
s, by value. Does cg_clif perform this decomposition as well (i.e. usei64
s in the CLIF)?
cfallin submitted PR Review.
cfallin created PR Review Comment:
@bjorn3 reading a bit more on the cg_clif side: I see at least here a libcall invocation that uses
I128
types directly, and in the general ABI mapping I see some stuff related to "spread" args (used for tuples it seems?) but no explicit handling fori128
s. (Does rustc's front/middle-end say thati128
s are "spread" acrossi64
s?)A possible solution: we could invent an ABI variant, essentially, that adds "rustc-compatible
i128
handling" on top of SysV or Fastcall. PerhapsCallConv::SystemVRust
andCallConv::WindowsFastcallRust
. Then we can keep thei128
-becomes-two-i64
s handling on the Cranelift side, and everything should basically work. When in vanillaSystemV
orWindowsFastcall
, I would then propose that we actually panic if we see ani128
, unless/until we later implement this in a strictly spec-compliant way.Thoughts?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
I see at least here a libcall invocation that uses I128 types directly
In the Windows support PR these arguments are passed by reference when comliling for Windows: https://github.com/bjorn3/rustc_codegen_cranelift/pull/1145/files#diff-0c7a9de394376b33448baed91218bb8167a6d7cac581590442cfb81b9f6e63f7R39
Does rustc's front/middle-end say that i128s are "spread" across i64s?
No, it just directly passes it to LLVM for SystemV and forces it to be passed Indirect on Windows.
A possible solution: we could invent an ABI variant, essentially, that adds "rustc-compatible i128 handling" on top of SysV or Fastcall.
The SystemV abi defines how
__int128
needs to be passed.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Interesting -- that seems to be inconsistent with what the actual compiler output shows (or at least, I may be misunderstanding in which cases the "scalars > 8 bytes" by-ref rule applies): in this example, the i128 args are being passed as two i64s, by value. Does cg_clif perform this decomposition as well (i.e. use i64s in the CLIF)?
In that example the "Rust" abi is used instead of the "C" abi. The "Rust" abi has the extra rule that any value that fits into two registers is given the
Cast(CastTarget::from(Reg { kind: RegKind::Integer, size: 128 }))
pass mode. This is independent of the real type of the value. It even works for structs. https://github.com/rust-lang/rust/blob/d2731d8e9338d8fe844e19d3fbb39617753e65f4/compiler/rustc_middle/src/ty/layout.rs#L2894-L2897
cfallin submitted PR Review.
cfallin created PR Review Comment:
Hmm, indeed. I hadn't realized that the Rust ABI would differ here (my naive understanding was that it had mostly to do with struct layout details, eg allowing fields to be reordered, but I see that's not the only thing!).
So, my earlier thoughts stemmed from that Compiler Explorer example and my resulting assumption that rustc was doing something weird and not-quite-standards-compliant with
i128
s, but that we would have to handle this in Cranelift in a function that specifies standard ABI. But, indeed, if I addextern "C"
to the signature, I see that rustc does the Fastcall-compliant thing with by-ref args.I see now that (correct me if I am wrong here) it appears that cg_clif will do the appropriate translation to SysV/Fastcall-compliant details for a Rust ABI signature; so we don't need to special-case this, and instead just need to implement the standard and nothing more.
The part I was hoping to avoid ("panic if we see an i128") was having to implement the by-ref semantics as well for
i128
, as this adds extra complexity and another case. The only time that we would see this would be when interfacing with a C-ABI function that usesi128
; either as anextern "C"
in rustc + cg_clif, or with an__int128
in a hypothetical future C/C++ frontend to Cranelift (or some other language that has 128-bit types). But, at least the first case (extern "C" with i128 with with cg_clif
) is possible to trigger, so we should handle it.I'll (i) implement by-ref for i128s, (ii) rip out the Rust ABI-specific details on Fastcall, and then (iii) add a whole bunch of tests for these dark corners.
cfallin edited PR Review Comment.
cfallin edited PR Review Comment.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
rip out the Rust ABI-specific details on Fastcall
If you are talking about passing i128 in two registers, that is still necessary for
extern "Rust"
as the abi of cg_clif and cg_llvm should match.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Hmm, now I am confused again. I had meant the code that implements "
i128
at CLIF level becomes twoi64
s in fastcall signature", because that is not Fastcall-compliant. If a frontend (cg_clif or any other) does this split, though, then that would still work as any otheri64
args would.So to clarify: are you saying that
cg_clif
will passi128
args forextern "Rust"
functions? (My understanding is no...)
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
For
extern "C"
functions it will pass it by reference. Forextern "Rust"
functions it will passi128
args.
cfallin submitted PR Review.
cfallin created PR Review Comment:
For extern "Rust" functions it will pass i128 args.
OK, so then for Rust ABI compatibility, this requires functionality in Cranelift that goes beyond standard Fastcall, right? If we implement fastcall-compliant handling of
i128
s, then Cranelift will generate by-ref arg passing code, and this will not be compatible with other Rust code (which splits into twoi64
registers), as shown in the Compiler Explorer example. This was the reason I had suggested Rust-specific variants ofCallConv
above. Does that make sense?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Makes sense, though I think it is more of an LLVM-specific variant, as it is an LLVM extension to support i128 passed in two registers. LLVM also for example supports passing three return values in registers, unlike SystemV which only allows two. This extension is supported by the old x86 backend already, but not in the new backend. cg_clif doesn't use more than two return registers, so that extension doesn't matter.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The
BMI2
variant is never constructed and was triggering a warning, so I've moved the pragma there, though I can remove this drive-by warning fixup and we can do a separate warning cleanup (the new backend build isn't fully warning-clean yet) if you'd prefer!
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Changed -- this has now been refactored so that we have a
Vec
ofABIArgSlot
s (old "parts") and don't overloadValueRegs
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Fixed this in refactor --
purpose
is part of toplevelABIArg::Slots
variant which contains vec of slot descriptions, so there is no duplication.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Fixed via refactor; not using
ValueRegs
anymore so noNone
sentinel variant is needed.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Did one better; just used alignment helper.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Resolved in latest update -- this behavior is now under the
enable_llvm_abi_extensions
flag as discussed in the comments below.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
align_to()
now used throughout this ABI code (and corresponding aarch64 code too).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
With more return values, we use the return-area pointer, similar to what happens under SysV. This is, properly speaking, an extension (for both ABIs); but it's necessary for e.g. Wasm multi-value support.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Switched to the usual
store
helper; not sure why the old code didn't use this.
cfallin submitted PR Review.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin updated PR #2678 from x64-fastcall
to main
.
abrown submitted PR Review.
cfallin updated PR #2678 from x64-fastcall
to main
.
cfallin merged PR #2678.
Last updated: Nov 22 2024 at 16:03 UTC