cfallin requested peterhuene and abrown for a review on PR #2710.
cfallin opened PR #2710 from x64-fastcall-unwind
to main
:
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.here were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support foralloca
(dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs
.The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted fromsp
once to allocate space, then used stores to[sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-indexstp r1, r2, [sp, #-16]
form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.This PR adds the new backend on Windows to CI as well.
cfallin requested peterhuene and abrown for a review on PR #2710.
cfallin edited PR #2710 from x64-fastcall-unwind
to main
:
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.here were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support foralloca
(dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs
.The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted fromsp
once to allocate space, then used stores to[sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-indexstp r1, r2, [sp, #-16]
form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.This PR adds the new backend on Windows to CI as well.
cfallin edited PR #2710 from x64-fastcall-unwind
to main
:
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support foralloca
(dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs
.The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted fromsp
once to allocate space, then used stores to[sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-indexstp r1, r2, [sp, #-16]
form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.This PR adds the new backend on Windows to CI as well.
cfallin edited PR #2710 from x64-fastcall-unwind
to main
:
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support foralloca
(dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
abi_impl.rs
.The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted fromsp
once to allocate space, then used stores to[sp, offset]
to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 768 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-indexstp r1, r2, [sp, #-16]
form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.This PR adds the new backend on Windows to CI as well.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
One thing I noticed while reading this is that this might be a bit easier to read if it was framed in the positive ("unwind_info") rather than the negative (enabling the disabling switch). Wasmtime would then enable this by default, but it'd be up to other consumers of cranelift whether or not they'd wish to enable this.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin created PR Review Comment:
Done, thanks!
cfallin submitted PR Review.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Could you restore these tests for the actual unwind info being emitted?
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Sorry, not sure why I removed that along with the dead code. Added back.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin requested alexcrichton, peterhuene and abrown for a review on PR #2710.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
FWIW since this is so common in the test suite I think it'd be reasonable to just either have this as the global default for cranelift or either just the default for the test suite.
alexcrichton created PR Review Comment:
I'm personally finding this a bit confusing and hard to follow. I have yet to read all the ABI stuff but I'm particularly confused about what the offset is here.
In the example above this directive comes after
mov rbp, rsp
, so I'm not sure how thefp_offset
plays into what's going on here since there's no frame adjustments at that point. I'm also somewhat confused since this is au8
and we're presumably supporting frames with more than 256 bytes of stack space, so I don't thinkfp_offset
is the size of the stack frame (but the docs say "base of the frame" which makes me think it is supposed to encompass the entire stack frame?)I may just need to read more to learn about this...
alexcrichton created PR Review Comment:
temporary allow?
alexcrichton created PR Review Comment:
FWIW I'm visiting this location after visiting the definition of
UnwindInst
, but this seems like a hidden detail that may want to be documented? This seems to indicate thatPushOldFp
for architectures withlr
(e.g. AArch64) indicates that two registers were pushed, bothfp
andlr
?Or I guess put another way this
PushOldFp
is intended to encompass architectures where the "frame pointer" is actually two separate registers (sorta)
alexcrichton created PR Review Comment:
Could these variants explicitly list the value for
frame_offset
from the variant below?
alexcrichton created PR Review Comment:
s/RegUnit/Reg/
alexcrichton created PR Review Comment:
Could this include the sample value for
fp_offset
for this variant?
alexcrichton created PR Review Comment:
Temporary during development?
alexcrichton created PR Review Comment:
I'm surprised here that this can make an assumption about this 16 bytes. While that may be true for x86_64/aarch64 today it seems like it should be factored out perhaps?
alexcrichton created PR Review Comment:
Could this exhaustively match and then have a comment for why other cases are ignored?
alexcrichton created PR Review Comment:
I realize that this code existed before you touched it, but I'm finding myself getting a bit lost in how many offsets from how many things there are with unwinding. Since all variants here have an
offset: u8
, which I presume is the code offset in the function, could that be factored out separately from theUnwindCode
itself? (e.g. stored somewhere as(u8, UnwindCode)
instead of justUnwindCode
alexcrichton created PR Review Comment:
Could this use
u32::from
or.into()
to show it's not losing any bits?
alexcrichton created PR Review Comment:
I'm a bit confused that this is happening as part of the
CreateFPFrame
unwind pseudo-instruction here. Given the original example it looked like this pseudo instruction happened as part of themov %rbp, %rsp
portion, but wouldn't the "stack alloc" here happen as part of the subtraction from%rsp
? Should theCreateFPFrame
opcode be split into two?
alexcrichton created PR Review Comment:
Technically this is adjusting the stack pointer a lot more than just for the clobbers I think? This is account for the whole frame, clobbers and statically known space needed?
alexcrichton created PR Review Comment:
Could this be preceded with
let off = cur_offset;
and thencur_offset
is adjusted in thematch
?
alexcrichton created PR Review Comment:
Could this have a comment indicating why this is 16? (presumably that we're skipping the 8 bytes we just pushed plus the 8 bytes pushed by the
call
instruction)
alexcrichton created PR Review Comment:
Technically this restriction is just on Windows x86_64, right? (even if we add AArch64 support for Windows that may have not have the same retrictions?)
alexcrichton created PR Review Comment:
This mentions
SImm7
but below it usesSImm7Scaled
which seems to have a range of (63 * 8) to -(64 * 8). Is that a bug in the documentation or am I missing the usage ofSImm7
? More-or-less I'm wondering if given a 504 byte range we could do everything as a positive offset relative tosp
?
alexcrichton created PR Review Comment:
Since this and
PopOldFP
aren't actually used anywhere today, should they be removed for now?I'd imagine that unwind info for the epilogue would also need to record that the saved registers no longer need to restored, but I don't think there's any emission for that just yet.
alexcrichton created PR Review Comment:
Ah so after further reading this isn't necessarily the entire call frame size, but just the size of the register saves we'll be doing?
alexcrichton created PR Review Comment:
Ok so after reading the various ABI implementations I think I understand a little better, but not entirely. It seems like this is primarily around for Windows and calculating adjustments on Unix, right?
For the Unix case (which I think I understand better) this seems like it should be something like
DeclareCobberFrameSize { size: u8 }
where it's basically just declaring that all saves to come later are relative offsets within a frame of sizesize
, allocated at the top of the stack just after the frame pointer was pushed.For Windows I'm much more unfamiliar with the unwinding stuff. I'm a bit confused, though, on why everything is
u8
here. It seems to be because of Windows but the exception codes all have short/far variants which seems like it's trying to account for offsets which are larger thanu8
perhaps, but I may be misunderstanding something. For Windows though it seems like this should be two instructions, where one signifies themov %rsp, %rbp
happened and the other happens after the stack decrement.I'll admit though I'm pretty confused on Windows, and I probably just need to sit down and read through their documentation some more. I'm not sure if we're doing RSP-relative addressing or whatever the second mode is mentioned at the top of this where something is FP relative. I always seem to get lost in the details thinking about stuff like this.
alexcrichton created PR Review Comment:
Hm ok well I was thinking that maybe
fp_offset
is the size of the clobber are and should be renamed, but I think I'm still just very confused on the definition of what this instruction is, where it goes, and what this offset is.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, that's true, but it would be significantly more complex to make the factored-out ABI implementation conditional on platform and do a different thing on aarch64. (It's a tradeoff for sure -- we could fork aarch64's code, as we used to have, but then that doubles the maintenance and update cost in other cases.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
So WIndows forces our hand in a few ways here:
- The way that clobbers are located is as a positive offset from the "frame", wherever we choose to define the frame. Since we place
RBP
/FP
just below the return address so that we have known offset addressing to stack args, and we place clobbers belowRBP
, we have to either define the frame in terms ofRSP
or as some offset belowRBP
. The former is difficult for several reasons but basically our nominal-SP tracking would not interact well with unwind info due to the order we compute and finalize things. So we define the frame in terms ofRBP
, butRBP
is at the top of it.- This is what
fp_offset
means (it comes directly from the "frame register offset" field inUNWIND_INFO
, which is also the source of theu8
-sized limit): when we declare the frame, its base is atRBP - fp_offset
. Equivalently, this is saying that the frame pointer is at the "frame pointer offset" into the frame.- Note that on SysV platforms, the frame is defined by the "CFA" (call-frame address) which is spec-defined to be the SP value at entry, so instead
fp_offset
is used to adjust the offsets for clobber-saves.- The
fp_offset
size limit (actually 240 bytes) is the reason that clobbers had to be moved to just belowRBP
as well; otherwise we could not define a frame far enough away fromRBP
to straddle our other fixed-frame slots.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Regarding
fp_offset
in particular, I'll go over the doc comments some more and see if I can add detail. Perhaps what we really need is just a small tutorial on unwind formats on Windows and SysV and how frames are computed in both cases :-)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, I'm thinking now perhaps we just give two offsets to describe FP: its offset above the start of clobbers, and its offset below SP at entry. These are the two values needed to produce either Windows or SysV info so it should make everything else completely platform-agnostic.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Perhaps it should just be called
CreateFrame
-- it was meant to be a single macro-op that captured anything a specific unwind format needs for the prologue. The close mapping betweensub %rsp
andStackAlloc
was the sort of thing I was hoping to avoid in this simplification :-) Given that we pin the frame toRBP
with theSetFPReg
, theStackAlloc
serves only to let the unwinder know that more stack space is being used; so as long as it comes after theSetFPReg
and before any uses of the clobber space, it shouldn't matter exactly where it occurs, I think.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yep, that's a stale comment, thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, my mistake, it is actually a scaled offset; I'll update the comment, thanks!
That said, going back to an sp-offset scheme requires two assumptions: (i) the subset of registers that are callee-save fits within 512 bytes; this is true for SysV but may be a subtle footgun in the future; and (ii) that we are alright with a two-stage thing, where we decrement SP once and save clobbers, then decrement again for the frame, unless we special-case the "everything is in range" case and collapse those two into one. I'd rather not have that complexity if we can avoid it!
Basically my high-level bit here is: try to find a common set of simple lowerings and shared layouts/schemes that does not have a lot of special cases for different ABIs, different architectures, different in-range/out-of-range cases; bugs and unmaintainable complexity lie that way. The optimized all-in-one
sub sp
stuff that this PR removes was already very hard for me to re-internalize when I had to change it for this work. So I'd rather standardize on a simple "push the things one at a time" lowering, and have the same frame layout everywhere, to keeps us all sane :-)
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ok thanks for the info! This helps dispel some of my confusion. I'm familiar with the CFA in dwarf but the "cfa equivalent" was actually in the other direction on Windows and I was getting confused how these two things map to each other. I think for now some extra documentation for how the field is expected to be interpreted on Windows and Unix should be fine myself.
But for the
u8
-ness, to make sure I understand, we're declaring to Windows "all register saves, which occur at positive offsets, are relative to XX" where XX isRBP-thing
andthing
is limited to 8 bits? We then have an additional requirement later on that when we save a register it can be saved at most 240 bytes away? (I couldn't actually find where that second restriction arose myself)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yep, this is left over from my intermediate PR state where I tried to handle epilogues too. Happy to remove!
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ok I think that's fine in the meantime anyway. I don't think it's required we get this 100% right but it might just be worth in the documentation to mention that it's sort of a macro-op for now.
alexcrichton created PR Review Comment:
Nah this is definitely fine to have, just wanted to double-check my understanding!
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Nah that makes sense, and I think it's fine to keep as-is. I definitely agree having a known-correct thing is the highest priority. If it turns out that we want to change this in the future for perf reasons or something like that we can cross that bridge when we get there.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Re: offset restrictions, only the former (
RBP - offset
); once we establish the frame we can declare clobber-saves at any 32-bit offset. But combining this with the fact that we placeRBP
above clobber-saves gives us the limit on clobber offsets.In hindsight, perhaps it's better to have explicit asserts that all offsets are in-range for the unwind info we're generating, and give these offset fields a
u32
or so; that would document these limits more explicitly I think!
cfallin submitted PR Review.
cfallin created PR Review Comment:
(Err, "any positive 32-bit offset" above.)
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Alternatively we could stop establishing a frame pointer until such a time we actually need one on Windows (the save offsets encoded in the unwind information would then be positive offsets from RSP rather than FP-[0..240], thereby eliminating the "saves have to be near enough to the frame pointer establishment" problem). That's a whole can of worms, though, I imagine.
peterhuene created PR Review Comment:
In fact, the old unwind info emitted for Windows explicitly didn't encode establishing a frame pointer (despite the prologue doing so) to side step the 240 offset limitation, thereby encoding all of the register saves as positive offsets from the stack pointer. This would obviously break when the frame pointer is required, such as dynamically-sized frames.
peterhuene submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Aha ok that makes a lot of sense now, where the 240 comes from. I do think there's definitely room for improvement here, but there's no need to happen in this PR I think. I'd be happy with some more comments myself, and then seems reasonable to open tracking issues if necessary.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done; changed just for filetests (I think it's too surprising to change the default overall given that it's needed for backtraces and correct operation generally).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yes, I was trying to get rid of some unused-code warnings in the new-backend PR and this percolated backward during some rebasing magic. Removed!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Fixed.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Removed, thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated along with the doc and field-name clarity changes noted below.
cfallin submitted PR Review.
cfallin created PR Review Comment:
So I've revamped the doc-comment for this enum and added the "offset upward / offset downward" dual fields for the CreateNewFrame pseudo-inst. Hopefully this is a bit more clear now!
Agreed that we could eventually look into
-fomit-frame-pointer
codegen, though that's a deeper rabbithole in the ABI code (as isalloca
support); one bridge at a time :-)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated as noted -- two offsets given now.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I've changed this to
instruction_offset
everywhere now. The tuple refactor is a bit more intrusive but I can do that instead if you'd prefer!
cfallin created PR Review Comment:
I did one better and just removed the
StackAlloc
; it doesn't seem to actually be needed once we pin the frame toRBP
.
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Fixed!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yeah, that's a silly bit of verbosity; fixed.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated comment here -- yes, you're right, the range is actually 504 bytes. I've written out the reasoning here w.r.t. simplicity and avoiding a two-stage frame setup with big frames.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yep, I've removed all traces of the epilogue attempt now.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Added to the doc-comment! I've also removed this to
PushFrameRegs
rather thanPushOldFP
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
alexcrichton submitted PR Review.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Minor nit: perhaps we should probably mention here that describing epilogues is not required by some platforms for correct operation. For example, Windows x64 unwind information doesn't describe epilogues as the epilogues are required to be a particular shape and the OS can "virtually run them forward" to unwind the frame when inside an epilogue.
peterhuene created PR Review Comment:
Is this match (and others like it) intentionally truncated?
peterhuene created PR Review Comment:
//! 240 bytes of RBP. Finally, it is not allowed to access memory
cfallin updated PR #2710 from x64-fastcall-unwind
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good point, done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, no, I had meant to fill these in after capturing the registers (the
Debug
-printed versions use regalloc indices); thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin merged PR #2710.
Last updated: Dec 23 2024 at 13:07 UTC