cfallin opened PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested bnjbvr for a review on PR #2142.
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
*for the
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Thanks!
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I wasn't there when the trait was designed, but: would it make sense to pass a context, or the consumer of these insts directly here? (I remember some cases where it wasn't possible to do so, because it was used in both the lowering and code emission contexts, so maybe that's irrelevant.)
This would avoid the smallvec allocation entirely here.
bnjbvr created PR Review Comment:
nit (here and below many times): can you use
pub(crate)
instead?
bnjbvr created PR Review Comment:
Please expand GV here.
bnjbvr created PR Review Comment:
Same question above wrt i32 vs i64.
bnjbvr created PR Review Comment:
Is it possible that the imm is actually large enough to trigger this? If so, would it make sense to materialize the immediate into a temp register, and use a reg/reg add below? If the imm can't be this large, could the trait function accept an
u32
instead of anu64
?
bnjbvr created PR Review Comment:
same question for i32 vs i64 in signature
bnjbvr created PR Review Comment:
Just reading this doesn't help understanding why it's fine to do so, which makes me think the trait functions should be redesigned so the temp reg is also returned from
gen_add_imm
, or something like this.
bnjbvr created PR Review Comment:
same remark about smallvec
bnjbvr created PR Review Comment:
nit: this comment can be removed, i think
bnjbvr created PR Review Comment:
This is a bit feeble to use a bool in this case, could it be its own enum, to enhance readability?
bnjbvr created PR Review Comment:
ditto
bnjbvr created PR Review Comment:
nit: could you use
expect
instead ofunwrap
, here and below? it likely adds cases where we have a very big offset, and it just wouldn't be implemented here.
bnjbvr created PR Review Comment:
Something I've meaning to change here, just to avoid dummy memory reallocations: can you use a
vec!
declaration here, instead of multiple pushes? This method could even return a plain&[]
, since in both cases (baldrdash/not baldrdash), these arrays are statically known.
bnjbvr created PR Review Comment:
We can use an unstable sort here, not that it matters...
bnjbvr created PR Review Comment:
Maybe we could rename this method
get_spillslot_byte_size
, then?
bnjbvr created PR Review Comment:
I don't know if this has an importance in terms of speed of gen'd code, but Spidermonkey does do a single sp+imm operation here, then use stores to sp+offset for each saved register. It would also avoid the weird
push 0
to keep the stack 16-bytes aligned.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I thought about this, but (as you suspected) this is called in contexts where we have no
LowerCtx
or other direct consumer of callbacks. We could statically monomorphize on a closure argument that receives instructions, but that seems unnecessarily complex and would duplicate code in the binary; I made sure to size theSmallVec
s so they should fit x64 and aarch64 cases in their inline storage, so the cost is (just) copying a fewInst
s instead. Happy to go the other way though if you'd prefer!
cfallin submitted PR Review.
cfallin created PR Review Comment:
u32
instead is much cleaner, thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Thanks for pointing this out; yeah, it is a bit convoluted. I've updated the documentation on the trait definition to clearly describe the requirements placed on the machine backend's implementation and register choices, and I've renamed
get_fixed_tmp_reg()
toget_stacklimit_reg()
to make its purpose more explicit.I thought for a bit if there might be a better way to do this, but I haven't managed to find one, unless we push the whole "compute this
GlobalValue
using only caller-saves" logic into each machine backend.gen_add_imm()
could return info about what it clobbers, but that still doesn't help the machine-independent implementation choose another register on its own. Really the machine-backend author has to choose fixed scratch registers and a corresponding add-with-large-immediate lowering (on RISCs at least) and just provide them to the machine-independent part.On x64, anyway, we can always have a full 32-bit immediate on an add, so the more tricky requirements are moot; all we need is some arbitrary caller-save register here. Hopefully simple enough but I'm open to other ideas :-)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Changed all of these to
i32
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good idea!
cfallin submitted PR Review.
cfallin created PR Review Comment:
The interface back to the lowering code reasons in terms of number of spillslots, so I'd rather not convert to/from bytes internally; but I renamed this to
get_number_of_spillslots_for_value()
, which is hopefully more clear!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done! Returning a
'static
slice would require refactoring some other things to be const functions, I think, including in regalloc.rs (Writable::from_reg
), or else adding lazy-inits, so I'm not sure if that's completely feasible at the moment, but perhaps we could think about this more...
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good point! (I had initially thought we wanted a stable sort to ensure deterministic code output, but of course the registers will be unique, so it doesn't matter...)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good point! Done.
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bnjbvr requested bnjbvr for a review on PR #2142.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Should we add "not yet implemented" to this error message? This may happen in real-world programs. Alternatively, should we report an
Err
Result
here, with an impl limit exceeded error?
bnjbvr created PR Review Comment:
Can we add an assertion that this is caller-save? (This depends on the calling convention, not sure if it's available from here.)
bnjbvr created PR Review Comment:
Sorry if this is not the right place/time for this suggestion, but: I am a bit confused when it comes to distinguish
Body
fromCall
, by just looking at the names. The comments really help, for that matter. Alternatively, what do you think of renamingABIBodyImpl
toCalleeABIImpl
, andABICallImpl
toCallerABIImpl
? (I just realize the consecutivei
make it a bit hard to decipher, with any casing... so maybe not such a great idea.)
bnjbvr created PR Review Comment:
(Another belated suggestion)
Impl
in a name (here, inABIMachineImpl
) makes me think that we have a concrete type, while it's actually a trait. Is there any other name that could avoid this confusion?ABIMachineSpec
for instance?
bnjbvr created PR Review Comment:
Real world programs could have large offsets (here and below) right? Should this message also contain "not implemented yet", or should the function return a
Result
, and here it'd be anErr
with impl limit exceeded?
cfallin submitted PR Review.
cfallin created PR Review Comment:
The stack offset here is limited by the stack-frame size, which we clamp to 128 MB in
compute_arg_locs()
, so I don't think we need to bubble up aResult
(since it should not be reachable code for any input); but I've clarified the error message. Thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done; just asserting that it's a caller-save in systemv and baldrdash.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Analogous to above, shouldn't happen due to stack-frame size limit; clarified messages.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good idea!
cfallin submitted PR Review.
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin created PR Review Comment:
Ah, yes, I like the callee/caller distinction better than "body" / "call"; it's much more intuitive. I've gone ahead with these names (I think the
...ABIImpl
is fine IMHO.) Thanks!
cfallin updated PR #2142 from machinst-abi-x64
to main
:
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin merged PR #2142.
Last updated: Dec 23 2024 at 12:05 UTC