Stream: git-wasmtime

Topic: wasmtime / PR #1734 Cranelift: Fix FPR saving and shadow ...


view this post on Zulip Wasmtime GitHub notifications bot (May 20 2020 at 22:39):

peterhuene opened PR #1734 from fix-saved-fprs to master:

This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to layout_stack so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents layout_stack from putting a spill or explicit
slot in this reserved space. In the future, layout_stack should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2020 at 22:39):

peterhuene requested bnjbvr and iximeow for a review on PR #1734.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2020 at 22:39):

peterhuene requested bnjbvr and iximeow for a review on PR #1734.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

To be correct for cases where isa.pointer_bytes() != 8:

        csr_stack_size = (csr_stack_size + 15) & !15;

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

Being winx64.rs, I think the assumption that a GPR is half the size of an FPR is workable, but do you think that's worth noting in this comment?

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

since local_stack_size ought to be 16-byte aligned, shouldn't stack_size & isa.pointer_bytes() always be zero? It seems like we could lift this up to an assert on function entry, if so.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

Any particular reason to check the rest of the function? Seems independent of making sure there's an ss0 and that this compiles.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

Likewise, I'm not sure what the checks on this function body are intending to test past the ss0 line?

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 20:58):

iximeow created PR Review Comment:

:tada: :tada:

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:12):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:12):

peterhuene created PR Review Comment:

stack_size (i.e. the adjustment to RSP) is not 16-byte aligned as it might be aligning RSP back to a 16-byte alignment.

Take for instance a prologue in a non-leaf function that pushes the frame pointer and then saves a single GPR. The push of the frame pointer realigned the stack from the push of return address, but the push of the GPR causes a misalignment. The local stack allocation will be +0x8 to account for that, which means stack_size itself is not a multiple of 16, but 8.

This lines cause causes the offset to be +0x8 when stack_size itself is +0x8 so that it gets subtracted out from the final offset from the post-adjusted RSP. The end result is that the offset should always be a multiple of 16 from a post-adjusted 16-byte aligned RSP.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:16):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:16):

peterhuene created PR Review Comment:

That said, I'm totally open to calculating this in a clearer way if you have any suggestions!

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:17):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:17):

peterhuene created PR Review Comment:

I think that's clearer and congruent with other places where we round up for alignment. I'll change.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:19):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:19):

peterhuene created PR Review Comment:

I can add a comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:21):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:21):

peterhuene created PR Review Comment:

I wanted to check the prologues and epilogues for correctness in each function. The intermediate instructions I'm not as concerned about, although I personally don't mind them being checked unless we think there's a likelihood that this will cause a maintenance headache for these tests.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:22):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:22):

peterhuene created PR Review Comment:

See reply above. Do you think it would be best if we simply limited it to just the stack layout and prologue/epilogue instructions?

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:22):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:22):

peterhuene created PR Review Comment:

It turns out asserting things in tests is good :smile:.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:24):

peterhuene edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:42):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:42):

iximeow created PR Review Comment:

That example helps, thanks. How's % types::F64X2.bytes() instead of & isa.pointer_bytes() sound to you? That picks an aligned offset for the type to store, ought to work for isa.pointer_bytes() == 4, and makes extension for ymm preservation (if we get to that in the future) clearer (they don't _need_ 32-byte alignment, but iiuc are somewhat penalized for misaligned access)

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:45):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:45):

iximeow created PR Review Comment:

Yeah I'm just thinking of maintenance for these. But I've convinced myself it's actually good to extend these checks here: I don't think we explicitly tested fastcall gpr preservation works as expected anywhere, and these aren't really liable to change in a surprising way.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:45):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 21:45):

iximeow created PR Review Comment:

(as above, now convinced that these are Good Actually)

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:31):

peterhuene edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:31):

peterhuene edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:33):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:33):

peterhuene created PR Review Comment:

Works for me. I'll get that changed!

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:55):

peterhuene updated PR #1734 from fix-saved-fprs to master:

This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to layout_stack so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents layout_stack from putting a spill or explicit
slot in this reserved space. In the future, layout_stack should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 22:57):

peterhuene updated PR #1734 from fix-saved-fprs to master:

This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to layout_stack so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents layout_stack from putting a spill or explicit
slot in this reserved space. In the future, layout_stack should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 23:05):

iximeow submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 23:30):

peterhuene submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 23:30):

peterhuene created PR Review Comment:

This should actually check if a frame pointer is being used, even though we currently always use one.

If a frame pointer is not used, then the offsets for the FPRs saves should not be altered as they already have the correct offset from RSP.

I'll quickly fix this.

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2020 at 23:48):

peterhuene updated PR #1734 from fix-saved-fprs to master:

This PR fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this PR removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to layout_stack so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents layout_stack from putting a spill or explicit
slot in this reserved space. In the future, layout_stack should take
advantage of the caller-provided shadow space for spills, but this PR does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2020 at 19:06):

peterhuene merged PR #1734.


Last updated: Jan 24 2025 at 00:11 UTC