cfallin opened PR #2267 from fix-aarch64-abi
to main
:
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of
regalloc.rs until that PR is reviewed, merged and included in a new
release.Fixes #2254.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested julian-seward1 for a review on PR #2267.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
uber-nit: can you join the two cases, please?
&Inst::Call { ref info } | &Inst::CallInd { ref info }
bnjbvr created PR Review Comment:
This rule about caller- vs callee- saved is actually generic over the targets, so we could also apply it for x64, right?
bnjbvr created PR Review Comment:
Maybe worth adding a back reference to the other comment (that references this one) around callee-saved in call sites?
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Similar naming concern here. Also, for good performance, this needs to be inlined, no?
julian-seward1 created PR Review Comment:
This all is reasonable (and as we discussed and agreed), but it has the side effect that I am not entirely happy with the fact that the name of this function doesn't make clear that it now only pertains to call sites. Would you be prepared to consider the following changes (or something modulo them, so as to increase clarity for the reader) ?
change name of function to
is_reg_clobbered_by_a_call_instruction
change name of formal param from
call_conv
tocall_conv_of_callee
(since it's kinda ambiguous at least from the name)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Unfortunately they're different types (
CallInfo
vs.CallIndInfo
) so we can't unify the patterns here :-/
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Indeed it is, although we get less benefit (only the compile-time improvement, no generated-code change) because x64 doesn't have half-caller/half-callee saves. Since it adds overhead to
Inst::CallKnown
/Inst::CallUnknown
(we don't out-of-line the fields yet) and there are issues with the size ofInst
, I'm inclined not to go down this path, at least at first. Can certainly look at this in a subsequent PR though!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Verified that inlining occurs; rustc seems to be doing the right thing here. (It's statically monomorphized so I expect it always would as long as someone does not inflate the body to some huge size...).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Re: naming: were you thinking something more specific to calls, as above, or something else?
I put a bit of thought into this and I'm not sure that I like including "call instruction", as the concept is more general. (In principle the
MachInst
layer doesn't even know about the concept of a call instruction.) The concept really is just "should we aggregate this instruction's effects into the summary of clobbered registers". Any other suggestions?Or if we do decide that this is specific to calls, I think I'd prefer
is_call_instruction_with_ignored_clobbers()
. I could go with this; it just seems a bit non-orthogonal, is all.
cfallin updated PR #2267 from fix-aarch64-abi
to main
:
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of
regalloc.rs until that PR is reviewed, merged and included in a new
release.Fixes #2254.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, I like this much better! The
ABIMachineSpec
now hasget_regs_clobbered_by_call()
, and aarch64 hasis_reg_saved_in_prologue()
,get_regs_saved_in_prologue()
, andis_reg_clobbered_by_call()
.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ah, leave it as it is then. That the other names have been improved is good enough for me. Thanks for doing that.
julian-seward1 submitted PR Review.
cfallin updated PR #2267 from fix-aarch64-abi
to main
:
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of
regalloc.rs until that PR is reviewed, merged and included in a new
release.Fixes #2254.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin edited PR #2267 from fix-aarch64-abi
to main
:
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).Fixes #2254.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin merged PR #2267.
Last updated: Dec 23 2024 at 12:05 UTC