peterhuene opened PR #1734 from fix-saved-fprs
to master
:
This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given tolayout_stack
so
that it is included in the frame size and to offset the layout of spills and
explicit slots.The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This preventslayout_stack
from putting a spill or explicit
slot in this reserved space. In the future,layout_stack
should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.The shadow space is now omitted from the local frame for leaf functions.
Fixes #1728.
Fixes #1587.
Fixes #1475.
peterhuene requested bnjbvr and iximeow for a review on PR #1734.
peterhuene requested bnjbvr and iximeow for a review on PR #1734.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
To be correct for cases where
isa.pointer_bytes() != 8
:csr_stack_size = (csr_stack_size + 15) & !15;
iximeow created PR Review Comment:
Being
winx64.rs
, I think the assumption that a GPR is half the size of an FPR is workable, but do you think that's worth noting in this comment?
iximeow created PR Review Comment:
since
local_stack_size
ought to be 16-byte aligned, shouldn'tstack_size & isa.pointer_bytes()
always be zero? It seems like we could lift this up to an assert on function entry, if so.
iximeow created PR Review Comment:
Any particular reason to check the rest of the function? Seems independent of making sure there's an
ss0
and that this compiles.
iximeow created PR Review Comment:
Likewise, I'm not sure what the checks on this function body are intending to test past the
ss0
line?
iximeow created PR Review Comment:
:tada: :tada:
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
stack_size
(i.e. the adjustment to RSP) is not 16-byte aligned as it might be aligning RSP back to a 16-byte alignment.Take for instance a prologue in a non-leaf function that pushes the frame pointer and then saves a single GPR. The push of the frame pointer realigned the stack from the push of return address, but the push of the GPR causes a misalignment. The local stack allocation will be +0x8 to account for that, which means
stack_size
itself is not a multiple of 16, but 8.This lines cause causes the offset to be +0x8 when
stack_size
itself is +0x8 so that it gets subtracted out from the final offset from the post-adjusted RSP. The end result is that the offset should always be a multiple of 16 from a post-adjusted 16-byte aligned RSP.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That said, I'm totally open to calculating this in a clearer way if you have any suggestions!
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think that's clearer and congruent with other places where we round up for alignment. I'll change.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can add a comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I wanted to check the prologues and epilogues for correctness in each function. The intermediate instructions I'm not as concerned about, although I personally don't mind them being checked unless we think there's a likelihood that this will cause a maintenance headache for these tests.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
See reply above. Do you think it would be best if we simply limited it to just the stack layout and prologue/epilogue instructions?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
It turns out asserting things in tests is good :smile:.
peterhuene edited PR Review Comment.
iximeow submitted PR Review.
iximeow created PR Review Comment:
That example helps, thanks. How's
% types::F64X2.bytes()
instead of& isa.pointer_bytes()
sound to you? That picks an aligned offset for the type to store, ought to work forisa.pointer_bytes() == 4
, and makes extension for ymm preservation (if we get to that in the future) clearer (they don't _need_ 32-byte alignment, but iiuc are somewhat penalized for misaligned access)
iximeow submitted PR Review.
iximeow created PR Review Comment:
Yeah I'm just thinking of maintenance for these. But I've convinced myself it's actually good to extend these checks here: I don't think we explicitly tested fastcall gpr preservation works as expected anywhere, and these aren't really liable to change in a surprising way.
iximeow submitted PR Review.
iximeow created PR Review Comment:
(as above, now convinced that these are Good Actually)
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Works for me. I'll get that changed!
peterhuene updated PR #1734 from fix-saved-fprs
to master
:
This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given tolayout_stack
so
that it is included in the frame size and to offset the layout of spills and
explicit slots.The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This preventslayout_stack
from putting a spill or explicit
slot in this reserved space. In the future,layout_stack
should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.The shadow space is now omitted from the local frame for leaf functions.
Fixes #1728.
Fixes #1587.
Fixes #1475.
peterhuene updated PR #1734 from fix-saved-fprs
to master
:
This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given tolayout_stack
so
that it is included in the frame size and to offset the layout of spills and
explicit slots.The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This preventslayout_stack
from putting a spill or explicit
slot in this reserved space. In the future,layout_stack
should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.The shadow space is now omitted from the local frame for leaf functions.
Fixes #1728.
Fixes #1587.
Fixes #1475.
iximeow submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This should actually check if a frame pointer is being used, even though we currently always use one.
If a frame pointer is not used, then the offsets for the FPRs saves should not be altered as they already have the correct offset from RSP.
I'll quickly fix this.
peterhuene updated PR #1734 from fix-saved-fprs
to master
:
This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given tolayout_stack
so
that it is included in the frame size and to offset the layout of spills and
explicit slots.The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This preventslayout_stack
from putting a spill or explicit
slot in this reserved space. In the future,layout_stack
should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.The shadow space is now omitted from the local frame for leaf functions.
Fixes #1728.
Fixes #1587.
Fixes #1475.
peterhuene merged PR #1734.
Last updated: Jan 24 2025 at 00:11 UTC