wasmtime / PR #2710 Rework/simplify unwind infrastructure... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #2710 Rework/simplify unwind infrastructure...

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:49):

cfallin requested peterhuene and abrown for a review on PR #2710.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:49):

cfallin opened PR #2710 from x64-fastcall-unwind to main:

Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

here were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for alloca (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from sp once to allocate space, then used stores to [sp, offset] to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index stp r1, r2, [sp, #-16] form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:49):

cfallin requested peterhuene and abrown for a review on PR #2710.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:50):

cfallin edited PR #2710 from x64-fastcall-unwind to main:

Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

here were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for alloca (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from sp once to allocate space, then used stores to [sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index stp r1, r2, [sp, #-16] form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:52):

cfallin edited PR #2710 from x64-fastcall-unwind to main:

Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for alloca (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from sp once to allocate space, then used stores to [sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index stp r1, r2, [sp, #-16] form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:53):

cfallin edited PR #2710 from x64-fastcall-unwind to main:

Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for alloca (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from sp once to allocate space, then used stores to [sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 768 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index stp r1, r2, [sp, #-16] form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.

Wasmtime GitHub notifications bot (Mar 06 2021 at 23:58):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 08 2021 at 15:12):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 08 2021 at 15:12):

alexcrichton created PR Review Comment:

One thing I noticed while reading this is that this might be a bit easier to read if it was framed in the positive ("unwind_info") rather than the negative (enabling the disabling switch). Wasmtime would then enable this by default, but it'd be up to other consumers of cranelift whether or not they'd wish to enable this.

Wasmtime GitHub notifications bot (Mar 08 2021 at 23:06):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 08 2021 at 23:08):

cfallin created PR Review Comment:

Done, thanks!

Wasmtime GitHub notifications bot (Mar 08 2021 at 23:08):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 08 2021 at 23:22):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 08 2021 at 23:26):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 09 2021 at 07:05):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 09 2021 at 07:18):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 09 2021 at 10:04):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Mar 09 2021 at 10:04):

bjorn3 created PR Review Comment:

Could you restore these tests for the actual unwind info being emitted?

Wasmtime GitHub notifications bot (Mar 09 2021 at 19:40):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 09 2021 at 20:06):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 09 2021 at 20:06):

cfallin created PR Review Comment:

Sorry, not sure why I removed that along with the dead code. Added back.

Wasmtime GitHub notifications bot (Mar 11 2021 at 02:46):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 11 2021 at 02:49):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 11 2021 at 17:22):

cfallin requested alexcrichton, peterhuene and abrown for a review on PR #2710.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

FWIW since this is so common in the test suite I think it'd be reasonable to just either have this as the global default for cranelift or either just the default for the test suite.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

I'm personally finding this a bit confusing and hard to follow. I have yet to read all the ABI stuff but I'm particularly confused about what the offset is here.

In the example above this directive comes after mov rbp, rsp, so I'm not sure how the fp_offset plays into what's going on here since there's no frame adjustments at that point. I'm also somewhat confused since this is a u8 and we're presumably supporting frames with more than 256 bytes of stack space, so I don't think fp_offset is the size of the stack frame (but the docs say "base of the frame" which makes me think it is supposed to encompass the entire stack frame?)

I may just need to read more to learn about this...

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

temporary allow?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

FWIW I'm visiting this location after visiting the definition of UnwindInst, but this seems like a hidden detail that may want to be documented? This seems to indicate that PushOldFp for architectures with lr (e.g. AArch64) indicates that two registers were pushed, both fp and lr?

Or I guess put another way this PushOldFp is intended to encompass architectures where the "frame pointer" is actually two separate registers (sorta)

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could these variants explicitly list the value for frame_offset from the variant below?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

s/RegUnit/Reg/

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could this include the sample value for fp_offset for this variant?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Temporary during development?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

I'm surprised here that this can make an assumption about this 16 bytes. While that may be true for x86_64/aarch64 today it seems like it should be factored out perhaps?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could this exhaustively match and then have a comment for why other cases are ignored?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

I realize that this code existed before you touched it, but I'm finding myself getting a bit lost in how many offsets from how many things there are with unwinding. Since all variants here have an offset: u8, which I presume is the code offset in the function, could that be factored out separately from the UnwindCode itself? (e.g. stored somewhere as (u8, UnwindCode) instead of just UnwindCode

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could this use u32::from or .into() to show it's not losing any bits?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

I'm a bit confused that this is happening as part of the CreateFPFrame unwind pseudo-instruction here. Given the original example it looked like this pseudo instruction happened as part of the mov %rbp, %rsp portion, but wouldn't the "stack alloc" here happen as part of the subtraction from %rsp? Should the CreateFPFrame opcode be split into two?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Technically this is adjusting the stack pointer a lot more than just for the clobbers I think? This is account for the whole frame, clobbers and statically known space needed?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could this be preceded with let off = cur_offset; and then cur_offset is adjusted in the match?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Could this have a comment indicating why this is 16? (presumably that we're skipping the 8 bytes we just pushed plus the 8 bytes pushed by the call instruction)

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Technically this restriction is just on Windows x86_64, right? (even if we add AArch64 support for Windows that may have not have the same retrictions?)

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

This mentions SImm7 but below it uses SImm7Scaled which seems to have a range of (63 * 8) to -(64 * 8). Is that a bug in the documentation or am I missing the usage of SImm7? More-or-less I'm wondering if given a 504 byte range we could do everything as a positive offset relative to sp?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Since this and PopOldFP aren't actually used anywhere today, should they be removed for now?

I'd imagine that unwind info for the epilogue would also need to record that the saved registers no longer need to restored, but I don't think there's any emission for that just yet.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Ah so after further reading this isn't necessarily the entire call frame size, but just the size of the register saves we'll be doing?

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Ok so after reading the various ABI implementations I think I understand a little better, but not entirely. It seems like this is primarily around for Windows and calculating adjustments on Unix, right?

For the Unix case (which I think I understand better) this seems like it should be something like DeclareCobberFrameSize { size: u8 } where it's basically just declaring that all saves to come later are relative offsets within a frame of size size, allocated at the top of the stack just after the frame pointer was pushed.

For Windows I'm much more unfamiliar with the unwinding stuff. I'm a bit confused, though, on why everything is u8 here. It seems to be because of Windows but the exception codes all have short/far variants which seems like it's trying to account for offsets which are larger than u8 perhaps, but I may be misunderstanding something. For Windows though it seems like this should be two instructions, where one signifies the mov %rsp, %rbp happened and the other happens after the stack decrement.

I'll admit though I'm pretty confused on Windows, and I probably just need to sit down and read through their documentation some more. I'm not sure if we're doing RSP-relative addressing or whatever the second mode is mentioned at the top of this where something is FP relative. I always seem to get lost in the details thinking about stuff like this.

Wasmtime GitHub notifications bot (Mar 11 2021 at 19:18):

alexcrichton created PR Review Comment:

Hm ok well I was thinking that maybe fp_offset is the size of the clobber are and should be renamed, but I think I'm still just very confused on the definition of what this instruction is, where it goes, and what this offset is.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:07):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:07):

cfallin created PR Review Comment:

Yes, that's true, but it would be significantly more complex to make the factored-out ABI implementation conditional on platform and do a different thing on aarch64. (It's a tradeoff for sure -- we could fork aarch64's code, as we used to have, but then that doubles the maintenance and update cost in other cases.)

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:14):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:14):

cfallin created PR Review Comment:

So WIndows forces our hand in a few ways here:

The way that clobbers are located is as a positive offset from the "frame", wherever we choose to define the frame. Since we place RBP/FP just below the return address so that we have known offset addressing to stack args, and we place clobbers below RBP, we have to either define the frame in terms of RSP or as some offset below RBP. The former is difficult for several reasons but basically our nominal-SP tracking would not interact well with unwind info due to the order we compute and finalize things. So we define the frame in terms of RBP, but RBP is at the top of it.

This is what fp_offset means (it comes directly from the "frame register offset" field in UNWIND_INFO, which is also the source of the u8-sized limit): when we declare the frame, its base is at RBP - fp_offset. Equivalently, this is saying that the frame pointer is at the "frame pointer offset" into the frame.

Note that on SysV platforms, the frame is defined by the "CFA" (call-frame address) which is spec-defined to be the SP value at entry, so instead fp_offset is used to adjust the offsets for clobber-saves.

The fp_offset size limit (actually 240 bytes) is the reason that clobbers had to be moved to just below RBP as well; otherwise we could not define a frame far enough away from RBP to straddle our other fixed-frame slots.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:15):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:15):

cfallin created PR Review Comment:

Regarding fp_offset in particular, I'll go over the doc comments some more and see if I can add detail. Perhaps what we really need is just a small tutorial on unwind formats on Windows and SysV and how frames are computed in both cases :-)

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:17):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:17):

cfallin created PR Review Comment:

Yes, I'm thinking now perhaps we just give two offsets to describe FP: its offset above the start of clobbers, and its offset below SP at entry. These are the two values needed to produce either Windows or SysV info so it should make everything else completely platform-agnostic.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:22):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:22):

cfallin created PR Review Comment:

Perhaps it should just be called CreateFrame -- it was meant to be a single macro-op that captured anything a specific unwind format needs for the prologue. The close mapping between sub %rsp and StackAlloc was the sort of thing I was hoping to avoid in this simplification :-) Given that we pin the frame to RBP with the SetFPReg, the StackAlloc serves only to let the unwinder know that more stack space is being used; so as long as it comes after the SetFPReg and before any uses of the clobber space, it shouldn't matter exactly where it occurs, I think.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:23):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:23):

cfallin created PR Review Comment:

Yep, that's a stale comment, thanks!

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:49):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:49):

cfallin created PR Review Comment:

Yes, my mistake, it is actually a scaled offset; I'll update the comment, thanks!

That said, going back to an sp-offset scheme requires two assumptions: (i) the subset of registers that are callee-save fits within 512 bytes; this is true for SysV but may be a subtle footgun in the future; and (ii) that we are alright with a two-stage thing, where we decrement SP once and save clobbers, then decrement again for the frame, unless we special-case the "everything is in range" case and collapse those two into one. I'd rather not have that complexity if we can avoid it!

Basically my high-level bit here is: try to find a common set of simple lowerings and shared layouts/schemes that does not have a lot of special cases for different ABIs, different architectures, different in-range/out-of-range cases; bugs and unmaintainable complexity lie that way. The optimized all-in-one sub sp stuff that this PR removes was already very hard for me to re-internalize when I had to change it for this work. So I'd rather standardize on a simple "push the things one at a time" lowering, and have the same frame layout everywhere, to keeps us all sane :-)

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:51):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:51):

alexcrichton created PR Review Comment:

Ok thanks for the info! This helps dispel some of my confusion. I'm familiar with the CFA in dwarf but the "cfa equivalent" was actually in the other direction on Windows and I was getting confused how these two things map to each other. I think for now some extra documentation for how the field is expected to be interpreted on Windows and Unix should be fine myself.

But for the u8-ness, to make sure I understand, we're declaring to Windows "all register saves, which occur at positive offsets, are relative to XX" where XX is RBP-thing and thing is limited to 8 bits? We then have an additional requirement later on that when we save a register it can be saved at most 240 bytes away? (I couldn't actually find where that second restriction arose myself)

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:51):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:51):

cfallin created PR Review Comment:

Yep, this is left over from my intermediate PR state where I tried to handle epilogues too. Happy to remove!

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:52):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:52):

alexcrichton created PR Review Comment:

Ok I think that's fine in the meantime anyway. I don't think it's required we get this 100% right but it might just be worth in the documentation to mention that it's sort of a macro-op for now.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:52):

alexcrichton created PR Review Comment:

Nah this is definitely fine to have, just wanted to double-check my understanding!

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:52):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:56):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 20:56):

alexcrichton created PR Review Comment:

Nah that makes sense, and I think it's fine to keep as-is. I definitely agree having a known-correct thing is the highest priority. If it turns out that we want to change this in the future for perf reasons or something like that we can cross that bridge when we get there.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:03):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:03):

cfallin created PR Review Comment:

Re: offset restrictions, only the former (RBP - offset); once we establish the frame we can declare clobber-saves at any 32-bit offset. But combining this with the fact that we place RBP above clobber-saves gives us the limit on clobber offsets.

In hindsight, perhaps it's better to have explicit asserts that all offsets are in-range for the unwind info we're generating, and give these offset fields a u32 or so; that would document these limits more explicitly I think!

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:04):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:04):

cfallin created PR Review Comment:

(Err, "any positive 32-bit offset" above.)

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:19):

peterhuene submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:19):

peterhuene created PR Review Comment:

Alternatively we could stop establishing a frame pointer until such a time we actually need one on Windows (the save offsets encoded in the unwind information would then be positive offsets from RSP rather than FP-[0..240], thereby eliminating the "saves have to be near enough to the frame pointer establishment" problem). That's a whole can of worms, though, I imagine.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:42):

peterhuene created PR Review Comment:

In fact, the old unwind info emitted for Windows explicitly didn't encode establishing a frame pointer (despite the prologue doing so) to side step the 240 offset limitation, thereby encoding all of the register saves as positive offsets from the stack pointer. This would obviously break when the frame pointer is required, such as dynamically-sized frames.

Wasmtime GitHub notifications bot (Mar 11 2021 at 21:42):

peterhuene submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 22:10):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 22:10):

alexcrichton created PR Review Comment:

Aha ok that makes a lot of sense now, where the 240 comes from. I do think there's definitely room for improvement here, but there's no need to happen in this PR I think. I'd be happy with some more comments myself, and then seems reasonable to open tracking issues if necessary.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:48):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:54):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:54):

cfallin created PR Review Comment:

Done; changed just for filetests (I think it's too surprising to change the default overall given that it's needed for backtraces and correct operation generally).

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin created PR Review Comment:

Ah, yes, I was trying to get rid of some unused-code warnings in the new-backend PR and this percolated backward during some rebasing magic. Removed!

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin created PR Review Comment:

Fixed.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin created PR Review Comment:

Removed, thanks!

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:55):

cfallin created PR Review Comment:

Updated along with the doc and field-name clarity changes noted below.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin created PR Review Comment:

So I've revamped the doc-comment for this enum and added the "offset upward / offset downward" dual fields for the CreateNewFrame pseudo-inst. Hopefully this is a bit more clear now!

Agreed that we could eventually look into -fomit-frame-pointer codegen, though that's a deeper rabbithole in the ABI code (as is alloca support); one bridge at a time :-)

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin created PR Review Comment:

Updated as noted -- two offsets given now.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 11 2021 at 23:57):

cfallin created PR Review Comment:

Done!

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:00):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:01):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:02):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:02):

cfallin created PR Review Comment:

I've changed this to instruction_offset everywhere now. The tuple refactor is a bit more intrusive but I can do that instead if you'd prefer!

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:02):

cfallin created PR Review Comment:

I did one better and just removed the StackAlloc; it doesn't seem to actually be needed once we pin the frame to RBP.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:02):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin created PR Review Comment:

Done.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin created PR Review Comment:

Fixed!

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:03):

cfallin created PR Review Comment:

Ah, yeah, that's a silly bit of verbosity; fixed.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:04):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:04):

cfallin created PR Review Comment:

Updated comment here -- yes, you're right, the range is actually 504 bytes. I've written out the reasoning here w.r.t. simplicity and avoiding a two-stage frame setup with big frames.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin created PR Review Comment:

Updated comment.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin created PR Review Comment:

Yep, I've removed all traces of the epilogue attempt now.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:05):

cfallin created PR Review Comment:

Added to the doc-comment! I've also removed this to PushFrameRegs rather than PushOldFP.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:06):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:06):

cfallin created PR Review Comment:

Updated.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:06):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:06):

cfallin created PR Review Comment:

Updated.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:10):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:11):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 00:59):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:18):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:30):

alexcrichton submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:31):

peterhuene submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:31):

peterhuene submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:31):

peterhuene created PR Review Comment:

Minor nit: perhaps we should probably mention here that describing epilogues is not required by some platforms for correct operation. For example, Windows x64 unwind information doesn't describe epilogues as the epilogues are required to be a particular shape and the OS can "virtually run them forward" to unwind the frame when inside an epilogue.

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:31):

peterhuene created PR Review Comment:

Is this match (and others like it) intentionally truncated?

Wasmtime GitHub notifications bot (Mar 12 2021 at 01:31):

peterhuene created PR Review Comment:

//! 240 bytes of RBP. Finally, it is not allowed to access memory

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:04):

cfallin updated PR #2710 from x64-fastcall-unwind to main.

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:04):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:04):

cfallin created PR Review Comment:

Good point, done!

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:06):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:06):

cfallin created PR Review Comment:

Ah, no, I had meant to fill these in after capturing the registers (the Debug-printed versions use regalloc indices); thanks!

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:07):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:07):

cfallin created PR Review Comment:

Done.

Wasmtime GitHub notifications bot (Mar 12 2021 at 04:47):

cfallin merged PR #2710.

Last updated: Apr 17 2025 at 17:03 UTC