wasmtime / issue #3864 Unwind: separate the "set frame re... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #3864 Unwind: separate the "set frame re...

Wasmtime GitHub notifications bot (Mar 01 2022 at 00:04):

Hmm, that didn't work on Windows-2022; a generic "stack overrun" crash. This will have to go to my back burner for a bit until I have a workable Windows machine to debug on; in the meantime if anyone else wants to poke at this, please feel free!

Wasmtime GitHub notifications bot (May 09 2022 at 17:30):

alexcrichton commented on issue #3864:

I was poking around at this again today. For a trivially recursive function (func $foo call $foo) I think that this is the machine code we generate with this PR

  0x0:  pushq   %rbp
  0x1:  unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  0x1:  movq    %rsp, %rbp
  0x4:  unwind SetFrameReg
  0x4:  movq    0(%rdi), %r10
  0x7:  movq    0(%r10), %r10
  0xa:  cmpq    %rsp, %r10
  0xd:  jbe ; ud2 stk_ovf ;
  0x15:  unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
  0x15:  movq    %rdi, %rsi
  0x18:  call    User { namespace: 0, index: 0 }
  0x1d:  jmp     label1
block1:
  0x1d:  movq    %rbp, %rsp
  0x20:  popq    %rbp
  0x21:  ret

where the prefixed hex number is the offset of that instruction within the MachBuffer

The UnwindInfo for windows generated by this PR is:

UnwindInfo {
    flags: 0,
    prologue_size: 21,
    frame_register: Some(
        5,
    ),
    frame_register_offset: 0,
    unwind_codes: [
        PushRegister {
            instruction_offset: 1,
            reg: 5,
        },
        SetFPReg {
            instruction_offset: 4,
        },
    ],
}

The diff from the unwind info from main is:

@@ -11,7 +11,7 @@
             reg: 5,
         },
         SetFPReg {
-            instruction_offset: 21,
+            instruction_offset: 4,
         },
     ],
 }

which is to be expected.

So the only other bit of unwind info we could really modify here seems to be prologue_size: 21, which does seem fishy because we're technically unwinding from within the prologue itself.

@peterhuene you may know more about windows unwinding here, does windows disallow unwinding from within the prologue itself?

Wasmtime GitHub notifications bot (May 09 2022 at 17:44):

peterhuene commented on issue #3864:

Unwinding from within the prologue is permitted on windows.

It is why the unwind codes are expected to be sorted in descending order from prologue offset: the unwinder simply walks the codes, skipping over any where the prologue offset is greater than the function offset of the exception address.

So if we were originally emitting the frame pointer adjustment unwind code with a prologue offset equal to the size of the prologue, then any exception from within the prologue would have skipped undoing that operation.

Wasmtime GitHub notifications bot (May 09 2022 at 17:51):

peterhuene edited a comment on issue #3864:

Unwinding from within the prologue is permitted on windows.

It is why the unwind codes are expected to be sorted in descending order from prologue offset: the unwinder simply walks the codes, skipping over any where the prologue offset is greater than the function offset of the exception address.

So if we were originally emitting the frame pointer establishment unwind code with a prologue offset equal to the size of the prologue, then any exception from within the prologue would have skipped undoing that operation.

Wasmtime GitHub notifications bot (May 09 2022 at 18:22):

alexcrichton commented on issue #3864:

Hm ok so the unwinding logs I have from this PR look like:
...
2022-05-09T17:24:20.1618041Z ip=0x7ffc7462b592 sp=0xb9a2d7b1e0
2022-05-09T17:24:20.1618120Z ip=0x7ffc745e2022 sp=0xb9a2d7b280
2022-05-09T17:24:20.1618188Z ip=0x7ffc74652e1e sp=0xb9a2d7b4c0
2022-05-09T17:24:20.1618271Z ip=0x24c8b5b1013 sp=0xb9a2d7c2a0
2022-05-09T17:24:20.1618403Z ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8
2022-05-09T17:24:20.1618484Z ip=0x24c8b5b101d sp=0xb9a2d7c2b0
2022-05-09T17:24:20.1618561Z ip=0xb9a2d7c2c0 sp=0xb9a2d7c2b8
2022-05-09T17:24:20.1618639Z ip=0x24c8b5b101d sp=0xb9a2d7c2c0
2022-05-09T17:24:20.1618707Z ip=0xb9a2d7c2d0 sp=0xb9a2d7c2c8
2022-05-09T17:24:20.1618783Z ip=0x24c8b5b101d sp=0xb9a2d7c2d0
...
where I believe the first wasm pc is ip=0x24c8b5b1013 sp=0xb9a2d7c2a0. Since text sections are page-aligned I think this corresponds with the address at 0x13 above which is in:
  0xd:  jbe ; ud2 stk_ovf ;
which is indeed the ud2 at 0x13.

The "second wasm frame", however is bogus ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8. The problem here is that the ip is actually a stack pointer and the sp is not 16-byte aligned.

This looks like the unwinder did not actually unwind the pushq %rbp instruction which is in our prologue. The unwinder then popped the return address of the call, which got the previous value of %rbp and misinterpreted it as an instruction pointer. Then everything goes awry b/c presumably everything is busted at that point.

I'm not sure why the PushRegister instruction isn't being unwound...

Wasmtime GitHub notifications bot (May 09 2022 at 18:25):

cfallin commented on issue #3864:

Thanks @alexcrichton for digging into this more!

For my own potential debugging later: are you able to test this locally on a desktop Windows install (and which version if so)? I do have a Win10 VM I can dig up, which I suspect corresponds to windows-2019 (or earlier); I wonder if I would have to upgrade to Win11 to see this behavior...

Wasmtime GitHub notifications bot (May 09 2022 at 18:28):

alexcrichton commented on issue #3864:

Nah I'm just throwing things at CI and reading output from github actions, basically vicariously printf-debugging through github heh

Wasmtime GitHub notifications bot (May 10 2022 at 00:14):

cfallin commented on issue #3864:

(Pushed rebase onto current main)

I'm at a bit of a loss here: I re-read all of the documentation on fastcall prologues and unwind info, and nothing seems amiss.

I will list all of the assumptions that we're making to see if any might be wrong:

Unwind entries are emitted in reverse order (highest offset first);

Unwind entries are associated with the offset of the following instruction (e.g. above, the push rbp is at offset 0 and is a 1-byte instruction, so is emitted with prologue offset 1);

The unwinder, if unwinding from a function offset in the middle of the prologue, will revert the effects of unwind ops at that offset or prior;

To revert a UWOP_PUSH_NONVOL (opcode 0), the unwinder reads old rbp from *rsp and increments rsp by 8;

We are allowed to have other instructions in the prologue (within the first "prologue size" bytes of the functino body), such as the stack check.

Any others? Does all of this sound reasonable? The most dubious is IMHO the last. However (i) if the unwinder claims to work by only reading the array of unwind ops, this shouldn't be a problem; and (ii) we need to make it so, because if there were clobber-saves, they would come after the stack limit check, and these are part of the prologue.

I suppose a high-level alternative design branch would be to do the stack limit check outside of the prologue. However this requires a bunch of re-engineering, and divergence from the current shared ABI code:

It implies two checks: "enough stack limit to save clobbers" before every call (this becomes the responsibility of the caller) and then the usual one, just after the prologue;

When the host calls Wasm, it implies a stack limit check analogous to the above (as part of the trampoline maybe?);

It implies that we decrement rsp twice, once to create space for clobber-saves and once to allocate the rest of the fixed frame (currently we collapse all this overhead into one sub instruction).

So that doesn't seem terribly tenable to me either.

It's really unfortunate that I don't have a way of reproducing this locally, and that the unwinder is so opaque. I'm not sure what to do currently and have no remaining strategies for solving this bug other than "ask Microsoft what changed in the Windows-2022 unwinder"; does anyone else have better ideas?

Wasmtime GitHub notifications bot (May 10 2022 at 00:15):

cfallin edited a comment on issue #3864:

(Pushed rebase onto current main)

I'm at a bit of a loss here: I re-read all of the documentation on fastcall prologues and unwind info, and nothing seems amiss.

I will list all of the assumptions that we're making to see if any might be wrong:

Unwind entries are emitted in reverse order (highest offset first);

Unwind entries are associated with the offset of the following instruction (e.g. above, the push rbp is at offset 0 and is a 1-byte instruction, so is emitted with prologue offset 1);

The unwinder, if unwinding from a function offset in the middle of the prologue, will revert the effects of unwind ops at that offset or prior;

To revert a UWOP_PUSH_NONVOL (opcode 0), the unwinder reads old rbp from *rsp and increments rsp by 8;

We are allowed to have other instructions in the prologue (within the first "prologue size" bytes of the function body), such as the stack check.

Any others? Does all of this sound reasonable? The most dubious is IMHO the last. However (i) if the unwinder claims to work by only reading the array of unwind ops, this shouldn't be a problem; and (ii) we need to make it so, because if there were clobber-saves, they would come after the stack limit check, and these are part of the prologue.

I suppose a high-level alternative design branch would be to do the stack limit check outside of the prologue. However this requires a bunch of re-engineering, and divergence from the current shared ABI code:

It implies two checks: "enough stack limit to save clobbers" before every call (this becomes the responsibility of the caller) and then the usual one, just after the prologue;

When the host calls Wasm, it implies a stack limit check analogous to the above (as part of the trampoline maybe?);

It implies that we decrement rsp twice, once to create space for clobber-saves and once to allocate the rest of the fixed frame (currently we collapse all this overhead into one sub instruction).

So that doesn't seem terribly tenable to me either.

It's really unfortunate that I don't have a way of reproducing this locally, and that the unwinder is so opaque. I'm not sure what to do currently and have no remaining strategies for solving this bug other than "ask Microsoft what changed in the Windows-2022 unwinder"; does anyone else have better ideas?

Wasmtime GitHub notifications bot (Jun 24 2022 at 14:11):

alexcrichton commented on issue #3864:

I did a "final" poking around at this last night as I was inspired again. I double-checked everything down to the binary encoding and API alls we're calling into Windows. Everything checked out and made sense except for the fact that this doesn't work. It definitely feels odd that the exact same code works on 2019 and doesn't work on 2022. At this point I think we'd need to enlist a Windows expert to figure this out.

Wasmtime GitHub notifications bot (Aug 02 2022 at 07:03):

cfallin commented on issue #3864:

I'm cleaning up old PRs and closing this now: since we no longer rely on unwind metadata for stack walking on Windows (now that #4431 is merged), this is no longer an issue.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:21):

bjorn3 commented on issue #3864:

Isn't this still necessary for debuggers?

Wasmtime GitHub notifications bot (Aug 02 2022 at 16:52):

cfallin commented on issue #3864:

Yes, I suppose it would still be an issue for debuggers on Windows versions with the bug. However we weren't able to find the issue after extensive debugging (see whole thread above!) and this doesn't seem to solve the problem, so for that case I would say I'm closing the PR simply because it doesn't work...

At this point to get Windows' native unwinder to fully work in windows-2022 I think we need someone who is a deep expert in SEH and/or is willing to single-instruction-step through Windows internals. That's definitely not me!

Last updated: Apr 17 2025 at 22:03 UTC