Stream: git-wasmtime

Topic: wasmtime / Issue #1967 STATUS_BAD_STACK on Windows on trap


view this post on Zulip Wasmtime GitHub notifications bot (Jul 02 2020 at 15:40):

alexcrichton opened Issue #1967:

Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 02 2020 at 15:40):

alexcrichton labeled Issue #1967:

Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 02 2020 at 15:42):

peterhuene commented on Issue #1967:

I'll take a look.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 02 2020 at 15:42):

peterhuene assigned Issue #1967 (assigned to peterhuene):

Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 02 2020 at 23:58):

peterhuene commented on Issue #1967:

I've been digging into this most of today with not much to report yet.

The unwind information seems correct, as well as the stack aligning done in the prologues.

I haven't been able to form a minimal repro yet either.

The unwind is coming from the trap returned from WASI's proc_exit.

I'll keep at it.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 04:30):

peterhuene commented on Issue #1967:

It appears the system unwinder can't get past the frame for 12746 for some yet-to-be-determined reason.

The Wasm frames that show up in the debugger prior to the STATUS_BAD_STACK:

The unwind stops there.

Prologue JIT for 12746:

0000023CCCEAB200 48 8B 01             mov         rax,qword ptr [rcx]
0000023CCCEAB203 48 8B 00             mov         rax,qword ptr [rax]
0000023CCCEAB206 48 81 C0 68 06 00 00 add         rax,668h
0000023CCCEAB20D 48 39 E0             cmp         rax,rsp
0000023CCCEAB210 72 02                jb          0000023CCCEAB214
0000023CCCEAB212 0F 0B                ud2
0000023CCCEAB214 40 55                push        rbp
0000023CCCEAB216 48 89 E5             mov         rbp,rsp
0000023CCCEAB219 40 53                push        rbx
0000023CCCEAB21B 40 56                push        rsi
0000023CCCEAB21D 40 57                push        rdi
0000023CCCEAB21F 41 54                push        r12
0000023CCCEAB221 41 55                push        r13
0000023CCCEAB223 41 56                push        r14
0000023CCCEAB225 41 57                push        r15
0000023CCCEAB227 48 81 EC 68 06 00 00 sub         rsp,668h
0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups      xmmword ptr [rsp+650h],xmm15

Function 12746 has the following encoded unwind information:

```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00

Decoded:

Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:

- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0

- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes

- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15

- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14

- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13

- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12

- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI

- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI

- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX

- Prologue offset: 25
Op: SetFramePointer

- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytes

This is correct for the generated prologue.  I've verified that the correct offset to the unwind information was registered with Windows for this function.

The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address.

Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack allocation size).

Still digging.

~~~

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 04:33):

peterhuene edited a comment on Issue #1967:

It appears the system unwinder can't get past the frame for 12746 for some yet-to-be-determined reason.

The Wasm frames that show up in the debugger prior to the STATUS_BAD_STACK:

The unwind stops there.

Prologue JIT for 12746:

0000023CCCEAB200 48 8B 01             mov         rax,qword ptr [rcx]
0000023CCCEAB203 48 8B 00             mov         rax,qword ptr [rax]
0000023CCCEAB206 48 81 C0 68 06 00 00 add         rax,668h
0000023CCCEAB20D 48 39 E0             cmp         rax,rsp
0000023CCCEAB210 72 02                jb          0000023CCCEAB214
0000023CCCEAB212 0F 0B                ud2
0000023CCCEAB214 40 55                push        rbp
0000023CCCEAB216 48 89 E5             mov         rbp,rsp
0000023CCCEAB219 40 53                push        rbx
0000023CCCEAB21B 40 56                push        rsi
0000023CCCEAB21D 40 57                push        rdi
0000023CCCEAB21F 41 54                push        r12
0000023CCCEAB221 41 55                push        r13
0000023CCCEAB223 41 56                push        r14
0000023CCCEAB225 41 57                push        r15
0000023CCCEAB227 48 81 EC 68 06 00 00 sub         rsp,668h
0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups      xmmword ptr [rsp+650h],xmm15

Function 12746 has the following encoded unwind information:

```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00

Decoded:

Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:

- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0

- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes

- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15

- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14

- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13

- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12

- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI

- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI

- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX

- Prologue offset: 25
Op: SetFramePointer

- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytes

This is correct for the generated prologue.  I've verified that the correct offset to the unwind information was registered with Windows for this function.

The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address.

Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack allocation size).  That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit).

Still digging.

~~~

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 04:40):

peterhuene edited a comment on Issue #1967:

It appears the system unwinder can't get past the frame for 12746 for some yet-to-be-determined reason.

The Wasm frames that show up in the debugger prior to the STATUS_BAD_STACK:

The unwind stops there.

Prologue JIT for 12746:

0000023CCCEAB200 48 8B 01             mov         rax,qword ptr [rcx]
0000023CCCEAB203 48 8B 00             mov         rax,qword ptr [rax]
0000023CCCEAB206 48 81 C0 68 06 00 00 add         rax,668h
0000023CCCEAB20D 48 39 E0             cmp         rax,rsp
0000023CCCEAB210 72 02                jb          0000023CCCEAB214
0000023CCCEAB212 0F 0B                ud2
0000023CCCEAB214 40 55                push        rbp
0000023CCCEAB216 48 89 E5             mov         rbp,rsp
0000023CCCEAB219 40 53                push        rbx
0000023CCCEAB21B 40 56                push        rsi
0000023CCCEAB21D 40 57                push        rdi
0000023CCCEAB21F 41 54                push        r12
0000023CCCEAB221 41 55                push        r13
0000023CCCEAB223 41 56                push        r14
0000023CCCEAB225 41 57                push        r15
0000023CCCEAB227 48 81 EC 68 06 00 00 sub         rsp,668h
0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups      xmmword ptr [rsp+650h],xmm15

Function 12746 has the following encoded unwind information:

```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00

Decoded:

Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:

- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0

- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes

- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15

- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14

- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13

- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12

- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI

- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI

- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX

- Prologue offset: 25
Op: SetFramePointer

- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytes

This is correct for the generated prologue.  I've verified that the correct offset to the unwind information was registered with Windows for this function.

The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address.

Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack pointer adjustment).  That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit).

Still digging.

~~~

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 07:16):

peterhuene edited a comment on Issue #1967:

It appears the system unwinder can't get past the frame for 12746 for some yet-to-be-determined reason.

The Wasm frames that show up in the debugger prior to the STATUS_BAD_STACK:

The unwind stops there.

Prologue JIT for 12746:

0000023CCCEAB200 48 8B 01             mov         rax,qword ptr [rcx]
0000023CCCEAB203 48 8B 00             mov         rax,qword ptr [rax]
0000023CCCEAB206 48 81 C0 68 06 00 00 add         rax,668h
0000023CCCEAB20D 48 39 E0             cmp         rax,rsp
0000023CCCEAB210 72 02                jb          0000023CCCEAB214
0000023CCCEAB212 0F 0B                ud2
0000023CCCEAB214 40 55                push        rbp
0000023CCCEAB216 48 89 E5             mov         rbp,rsp
0000023CCCEAB219 40 53                push        rbx
0000023CCCEAB21B 40 56                push        rsi
0000023CCCEAB21D 40 57                push        rdi
0000023CCCEAB21F 41 54                push        r12
0000023CCCEAB221 41 55                push        r13
0000023CCCEAB223 41 56                push        r14
0000023CCCEAB225 41 57                push        r15
0000023CCCEAB227 48 81 EC 68 06 00 00 sub         rsp,668h
0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups      xmmword ptr [rsp+650h],xmm15

Function 12746 has the following encoded unwind information:

```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00

Decoded:

Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Frame pointer register: RBP
Frame pointer offset: 80
Codes:

- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0

- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes

- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15

- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14

- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13

- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12

- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI

- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI

- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX

- Prologue offset: 25
Op: SetFramePointer

- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytes

This is correct for the generated prologue.  I've verified that the correct offset to the unwind information was registered with Windows for this function.

The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address.

Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack pointer adjustment).  That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit).

Still digging.

~~~

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 07:48):

peterhuene commented on Issue #1967:

I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g. mov rbp,rsp), stack walking fails.

The problem boils down to the way the unwind information is describing the frame pointer. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.

However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls alloca).

The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame".

The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to 0 and the SetFramePointer unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 07:56):

peterhuene edited a comment on Issue #1967:

I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g. mov rbp,rsp), stack walking fails.

The problem boils down to the way the unwind information is describing the frame pointer. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.

However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls alloca).

The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g. lea rbp, [rsp-0x50] in this case) which we're obviously not doing.

The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to 0 and the SetFramePointer unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 08:00):

peterhuene edited a comment on Issue #1967:

I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g. mov rbp,rsp), stack walking fails.

The problem boils down to the way the unwind information is describing the frame pointer for frames with FPR saves. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.

However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls alloca).

The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g. lea rbp, [rsp-0x50] in this case) which we're obviously not doing.

The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to 0 and the SetFramePointer unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 03 2020 at 08:04):

peterhuene edited a comment on Issue #1967:

I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g. mov rbp,rsp), stack walking fails.

The problem boils down to the way the unwind information is describing the frame pointer for frames with FPR saves. Cranelift establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.

However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls alloca).

The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g. lea rbp, [rsp-0x50] in this case) which we're obviously not doing.

The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to 0 and the SetFramePointer unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 04 2020 at 22:05):

peterhuene commented on Issue #1967:

I have a fix available, but I need to add a test case where we have a trap with frames that have FPR saves so that we don't regress this.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 06 2020 at 18:55):

peterhuene labeled Issue #1967 (assigned to peterhuene):

Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2020 at 05:26):

peterhuene closed Issue #1967 (assigned to peterhuene):

Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?


Last updated: Nov 22 2024 at 17:03 UTC