cfallin commented on issue #3864:
Hmm, that didn't work on Windows-2022; a generic "stack overrun" crash. This will have to go to my back burner for a bit until I have a workable Windows machine to debug on; in the meantime if anyone else wants to poke at this, please feel free!
alexcrichton commented on issue #3864:
I was poking around at this again today. For a trivially recursive function
(func $foo call $foo)
I think that this is the machine code we generate with this PR0x0: pushq %rbp 0x1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } 0x1: movq %rsp, %rbp 0x4: unwind SetFrameReg 0x4: movq 0(%rdi), %r10 0x7: movq 0(%r10), %r10 0xa: cmpq %rsp, %r10 0xd: jbe ; ud2 stk_ovf ; 0x15: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } block0: 0x15: movq %rdi, %rsi 0x18: call User { namespace: 0, index: 0 } 0x1d: jmp label1 block1: 0x1d: movq %rbp, %rsp 0x20: popq %rbp 0x21: ret
where the prefixed hex number is the offset of that instruction within the
MachBuffer
The
UnwindInfo
for windows generated by this PR is:UnwindInfo { flags: 0, prologue_size: 21, frame_register: Some( 5, ), frame_register_offset: 0, unwind_codes: [ PushRegister { instruction_offset: 1, reg: 5, }, SetFPReg { instruction_offset: 4, }, ], }
The diff from the unwind info from
main
is:@@ -11,7 +11,7 @@ reg: 5, }, SetFPReg { - instruction_offset: 21, + instruction_offset: 4, }, ], }
which is to be expected.
So the only other bit of unwind info we could really modify here seems to be
prologue_size: 21
, which does seem fishy because we're technically unwinding from within the prologue itself.@peterhuene you may know more about windows unwinding here, does windows disallow unwinding from within the prologue itself?
peterhuene commented on issue #3864:
Unwinding from within the prologue is permitted on windows.
It is why the unwind codes are expected to be sorted in descending order from prologue offset: the unwinder simply walks the codes, skipping over any where the prologue offset is greater than the function offset of the exception address.
So if we were originally emitting the frame pointer adjustment unwind code with a prologue offset equal to the size of the prologue, then any exception from within the prologue would have skipped undoing that operation.
peterhuene edited a comment on issue #3864:
Unwinding from within the prologue is permitted on windows.
It is why the unwind codes are expected to be sorted in descending order from prologue offset: the unwinder simply walks the codes, skipping over any where the prologue offset is greater than the function offset of the exception address.
So if we were originally emitting the frame pointer establishment unwind code with a prologue offset equal to the size of the prologue, then any exception from within the prologue would have skipped undoing that operation.
alexcrichton commented on issue #3864:
Hm ok so the unwinding logs I have from this PR look like:
... 2022-05-09T17:24:20.1618041Z ip=0x7ffc7462b592 sp=0xb9a2d7b1e0 2022-05-09T17:24:20.1618120Z ip=0x7ffc745e2022 sp=0xb9a2d7b280 2022-05-09T17:24:20.1618188Z ip=0x7ffc74652e1e sp=0xb9a2d7b4c0 2022-05-09T17:24:20.1618271Z ip=0x24c8b5b1013 sp=0xb9a2d7c2a0 2022-05-09T17:24:20.1618403Z ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8 2022-05-09T17:24:20.1618484Z ip=0x24c8b5b101d sp=0xb9a2d7c2b0 2022-05-09T17:24:20.1618561Z ip=0xb9a2d7c2c0 sp=0xb9a2d7c2b8 2022-05-09T17:24:20.1618639Z ip=0x24c8b5b101d sp=0xb9a2d7c2c0 2022-05-09T17:24:20.1618707Z ip=0xb9a2d7c2d0 sp=0xb9a2d7c2c8 2022-05-09T17:24:20.1618783Z ip=0x24c8b5b101d sp=0xb9a2d7c2d0 ...
where I believe the first wasm pc is
ip=0x24c8b5b1013 sp=0xb9a2d7c2a0
. Since text sections are page-aligned I think this corresponds with the address at 0x13 above which is in:0xd: jbe ; ud2 stk_ovf ;
which is indeed the
ud2
at 0x13.The "second wasm frame", however is bogus
ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8
. The problem here is that theip
is actually a stack pointer and thesp
is not 16-byte aligned.This looks like the unwinder did not actually unwind the
pushq %rbp
instruction which is in our prologue. The unwinder then popped the return address of thecall
, which got the previous value of%rbp
and misinterpreted it as an instruction pointer. Then everything goes awry b/c presumably everything is busted at that point.I'm not sure why the
PushRegister
instruction isn't being unwound...
cfallin commented on issue #3864:
Thanks @alexcrichton for digging into this more!
For my own potential debugging later: are you able to test this locally on a desktop Windows install (and which version if so)? I do have a Win10 VM I can dig up, which I suspect corresponds to
windows-2019
(or earlier); I wonder if I would have to upgrade to Win11 to see this behavior...
alexcrichton commented on issue #3864:
Nah I'm just throwing things at CI and reading output from github actions, basically vicariously printf-debugging through github heh
cfallin commented on issue #3864:
(Pushed rebase onto current
main
)I'm at a bit of a loss here: I re-read all of the documentation on fastcall prologues and unwind info, and nothing seems amiss.
I will list all of the assumptions that we're making to see if any might be wrong:
- Unwind entries are emitted in reverse order (highest offset first);
- Unwind entries are associated with the offset of the following instruction (e.g. above, the
push rbp
is at offset 0 and is a 1-byte instruction, so is emitted with prologue offset 1);- The unwinder, if unwinding from a function offset in the middle of the prologue, will revert the effects of unwind ops at that offset or prior;
- To revert a
UWOP_PUSH_NONVOL
(opcode 0), the unwinder reads oldrbp
from*rsp
and incrementsrsp
by 8;- We are allowed to have other instructions in the prologue (within the first "prologue size" bytes of the functino body), such as the stack check.
Any others? Does all of this sound reasonable? The most dubious is IMHO the last. However (i) if the unwinder claims to work by only reading the array of unwind ops, this shouldn't be a problem; and (ii) we need to make it so, because if there were clobber-saves, they would come after the stack limit check, and these are part of the prologue.
I suppose a high-level alternative design branch would be to do the stack limit check outside of the prologue. However this requires a bunch of re-engineering, and divergence from the current shared ABI code:
- It implies two checks: "enough stack limit to save clobbers" before every call (this becomes the responsibility of the caller) and then the usual one, just after the prologue;
- When the host calls Wasm, it implies a stack limit check analogous to the above (as part of the trampoline maybe?);
- It implies that we decrement rsp twice, once to create space for clobber-saves and once to allocate the rest of the fixed frame (currently we collapse all this overhead into one
sub
instruction).So that doesn't seem terribly tenable to me either.
It's really unfortunate that I don't have a way of reproducing this locally, and that the unwinder is so opaque. I'm not sure what to do currently and have no remaining strategies for solving this bug other than "ask Microsoft what changed in the Windows-2022 unwinder"; does anyone else have better ideas?
cfallin edited a comment on issue #3864:
(Pushed rebase onto current
main
)I'm at a bit of a loss here: I re-read all of the documentation on fastcall prologues and unwind info, and nothing seems amiss.
I will list all of the assumptions that we're making to see if any might be wrong:
- Unwind entries are emitted in reverse order (highest offset first);
- Unwind entries are associated with the offset of the following instruction (e.g. above, the
push rbp
is at offset 0 and is a 1-byte instruction, so is emitted with prologue offset 1);- The unwinder, if unwinding from a function offset in the middle of the prologue, will revert the effects of unwind ops at that offset or prior;
- To revert a
UWOP_PUSH_NONVOL
(opcode 0), the unwinder reads oldrbp
from*rsp
and incrementsrsp
by 8;- We are allowed to have other instructions in the prologue (within the first "prologue size" bytes of the function body), such as the stack check.
Any others? Does all of this sound reasonable? The most dubious is IMHO the last. However (i) if the unwinder claims to work by only reading the array of unwind ops, this shouldn't be a problem; and (ii) we need to make it so, because if there were clobber-saves, they would come after the stack limit check, and these are part of the prologue.
I suppose a high-level alternative design branch would be to do the stack limit check outside of the prologue. However this requires a bunch of re-engineering, and divergence from the current shared ABI code:
- It implies two checks: "enough stack limit to save clobbers" before every call (this becomes the responsibility of the caller) and then the usual one, just after the prologue;
- When the host calls Wasm, it implies a stack limit check analogous to the above (as part of the trampoline maybe?);
- It implies that we decrement rsp twice, once to create space for clobber-saves and once to allocate the rest of the fixed frame (currently we collapse all this overhead into one
sub
instruction).So that doesn't seem terribly tenable to me either.
It's really unfortunate that I don't have a way of reproducing this locally, and that the unwinder is so opaque. I'm not sure what to do currently and have no remaining strategies for solving this bug other than "ask Microsoft what changed in the Windows-2022 unwinder"; does anyone else have better ideas?
alexcrichton commented on issue #3864:
I did a "final" poking around at this last night as I was inspired again. I double-checked everything down to the binary encoding and API alls we're calling into Windows. Everything checked out and made sense except for the fact that this doesn't work. It definitely feels odd that the exact same code works on 2019 and doesn't work on 2022. At this point I think we'd need to enlist a Windows expert to figure this out.
cfallin commented on issue #3864:
I'm cleaning up old PRs and closing this now: since we no longer rely on unwind metadata for stack walking on Windows (now that #4431 is merged), this is no longer an issue.
bjorn3 commented on issue #3864:
Isn't this still necessary for debuggers?
cfallin commented on issue #3864:
Yes, I suppose it would still be an issue for debuggers on Windows versions with the bug. However we weren't able to find the issue after extensive debugging (see whole thread above!) and this doesn't seem to solve the problem, so for that case I would say I'm closing the PR simply because it doesn't work...
At this point to get Windows' native unwinder to fully work in windows-2022 I think we need someone who is a deep expert in SEH and/or is willing to single-instruction-step through Windows internals. That's definitely not me!
Last updated: Nov 22 2024 at 16:03 UTC