alexcrichton opened Issue #1967:
Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:
$ gunzip yosys.wasm.gz $ cargo run --release -- run yosys.wasm -- -V ... Finished release [optimized] target(s) in 3.84s Running `target\release\wasmtime.exe run yosys.wasm -- -V` error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)
Something funny is going on here though because
--opt-level 0
does not reproduce the issue (--opt-level 1
does though).I'm trying to run this right now through
wasm-reduce
from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?
alexcrichton labeled Issue #1967:
Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:
$ gunzip yosys.wasm.gz $ cargo run --release -- run yosys.wasm -- -V ... Finished release [optimized] target(s) in 3.84s Running `target\release\wasmtime.exe run yosys.wasm -- -V` error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)
Something funny is going on here though because
--opt-level 0
does not reproduce the issue (--opt-level 1
does though).I'm trying to run this right now through
wasm-reduce
from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?
peterhuene commented on Issue #1967:
I'll take a look.
peterhuene assigned Issue #1967 (assigned to peterhuene):
Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:
$ gunzip yosys.wasm.gz $ cargo run --release -- run yosys.wasm -- -V ... Finished release [optimized] target(s) in 3.84s Running `target\release\wasmtime.exe run yosys.wasm -- -V` error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)
Something funny is going on here though because
--opt-level 0
does not reproduce the issue (--opt-level 1
does though).I'm trying to run this right now through
wasm-reduce
from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?
peterhuene commented on Issue #1967:
I've been digging into this most of today with not much to report yet.
The unwind information seems correct, as well as the stack aligning done in the prologues.
I haven't been able to form a minimal repro yet either.
The unwind is coming from the trap returned from WASI's proc_exit.
I'll keep at it.
peterhuene commented on Issue #1967:
It appears the system unwinder can't get past the frame for
12746
for some yet-to-be-determined reason.The Wasm frames that show up in the debugger prior to the
STATUS_BAD_STACK
:
- 12469 (calls
wasi_snapshot_preview1.proc_exit
)- 12504
- 12746 (JIT'd function is 96059 bytes!)
The unwind stops there.
Prologue JIT for
12746
:0000023CCCEAB200 48 8B 01 mov rax,qword ptr [rcx] 0000023CCCEAB203 48 8B 00 mov rax,qword ptr [rax] 0000023CCCEAB206 48 81 C0 68 06 00 00 add rax,668h 0000023CCCEAB20D 48 39 E0 cmp rax,rsp 0000023CCCEAB210 72 02 jb 0000023CCCEAB214 0000023CCCEAB212 0F 0B ud2 0000023CCCEAB214 40 55 push rbp 0000023CCCEAB216 48 89 E5 mov rbp,rsp 0000023CCCEAB219 40 53 push rbx 0000023CCCEAB21B 40 56 push rsi 0000023CCCEAB21D 40 57 push rdi 0000023CCCEAB21F 41 54 push r12 0000023CCCEAB221 41 55 push r13 0000023CCCEAB223 41 56 push r14 0000023CCCEAB225 41 57 push r15 0000023CCCEAB227 48 81 EC 68 06 00 00 sub rsp,668h 0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups xmmword ptr [rsp+650h],xmm15
Function
12746
has the following encoded unwind information:```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00Decoded:
Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX- Prologue offset: 25
Op: SetFramePointer- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytesThis is correct for the generated prologue. I've verified that the correct offset to the unwind information was registered with Windows for this function. The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address. Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack allocation size). Still digging. ~~~
peterhuene edited a comment on Issue #1967:
It appears the system unwinder can't get past the frame for
12746
for some yet-to-be-determined reason.The Wasm frames that show up in the debugger prior to the
STATUS_BAD_STACK
:
- 12469 (calls
wasi_snapshot_preview1.proc_exit
)- 12504
- 12746 (JIT'd function is 96059 bytes!)
The unwind stops there.
Prologue JIT for
12746
:0000023CCCEAB200 48 8B 01 mov rax,qword ptr [rcx] 0000023CCCEAB203 48 8B 00 mov rax,qword ptr [rax] 0000023CCCEAB206 48 81 C0 68 06 00 00 add rax,668h 0000023CCCEAB20D 48 39 E0 cmp rax,rsp 0000023CCCEAB210 72 02 jb 0000023CCCEAB214 0000023CCCEAB212 0F 0B ud2 0000023CCCEAB214 40 55 push rbp 0000023CCCEAB216 48 89 E5 mov rbp,rsp 0000023CCCEAB219 40 53 push rbx 0000023CCCEAB21B 40 56 push rsi 0000023CCCEAB21D 40 57 push rdi 0000023CCCEAB21F 41 54 push r12 0000023CCCEAB221 41 55 push r13 0000023CCCEAB223 41 56 push r14 0000023CCCEAB225 41 57 push r15 0000023CCCEAB227 48 81 EC 68 06 00 00 sub rsp,668h 0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups xmmword ptr [rsp+650h],xmm15
Function
12746
has the following encoded unwind information:```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00Decoded:
Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX- Prologue offset: 25
Op: SetFramePointer- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytesThis is correct for the generated prologue. I've verified that the correct offset to the unwind information was registered with Windows for this function. The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address. Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack allocation size). That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit). Still digging. ~~~
peterhuene edited a comment on Issue #1967:
It appears the system unwinder can't get past the frame for
12746
for some yet-to-be-determined reason.The Wasm frames that show up in the debugger prior to the
STATUS_BAD_STACK
:
- 12469 (calls
wasi_snapshot_preview1.proc_exit
)- 12504
- 12746 (JIT'd function is 96059 bytes!)
The unwind stops there.
Prologue JIT for
12746
:0000023CCCEAB200 48 8B 01 mov rax,qword ptr [rcx] 0000023CCCEAB203 48 8B 00 mov rax,qword ptr [rax] 0000023CCCEAB206 48 81 C0 68 06 00 00 add rax,668h 0000023CCCEAB20D 48 39 E0 cmp rax,rsp 0000023CCCEAB210 72 02 jb 0000023CCCEAB214 0000023CCCEAB212 0F 0B ud2 0000023CCCEAB214 40 55 push rbp 0000023CCCEAB216 48 89 E5 mov rbp,rsp 0000023CCCEAB219 40 53 push rbx 0000023CCCEAB21B 40 56 push rsi 0000023CCCEAB21D 40 57 push rdi 0000023CCCEAB21F 41 54 push r12 0000023CCCEAB221 41 55 push r13 0000023CCCEAB223 41 56 push r14 0000023CCCEAB225 41 57 push r15 0000023CCCEAB227 48 81 EC 68 06 00 00 sub rsp,668h 0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups xmmword ptr [rsp+650h],xmm15
Function
12746
has the following encoded unwind information:```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00Decoded:
Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Register frame: RBP
Register frame offset: 80
Codes:- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX- Prologue offset: 25
Op: SetFramePointer- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytesThis is correct for the generated prologue. I've verified that the correct offset to the unwind information was registered with Windows for this function. The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address. Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack pointer adjustment). That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit). Still digging. ~~~
peterhuene edited a comment on Issue #1967:
It appears the system unwinder can't get past the frame for
12746
for some yet-to-be-determined reason.The Wasm frames that show up in the debugger prior to the
STATUS_BAD_STACK
:
- 12469 (calls
wasi_snapshot_preview1.proc_exit
)- 12504
- 12746 (JIT'd function is 96059 bytes!)
The unwind stops there.
Prologue JIT for
12746
:0000023CCCEAB200 48 8B 01 mov rax,qword ptr [rcx] 0000023CCCEAB203 48 8B 00 mov rax,qword ptr [rax] 0000023CCCEAB206 48 81 C0 68 06 00 00 add rax,668h 0000023CCCEAB20D 48 39 E0 cmp rax,rsp 0000023CCCEAB210 72 02 jb 0000023CCCEAB214 0000023CCCEAB212 0F 0B ud2 0000023CCCEAB214 40 55 push rbp 0000023CCCEAB216 48 89 E5 mov rbp,rsp 0000023CCCEAB219 40 53 push rbx 0000023CCCEAB21B 40 56 push rsi 0000023CCCEAB21D 40 57 push rdi 0000023CCCEAB21F 41 54 push r12 0000023CCCEAB221 41 55 push r13 0000023CCCEAB223 41 56 push r14 0000023CCCEAB225 41 57 push r15 0000023CCCEAB227 48 81 EC 68 06 00 00 sub rsp,668h 0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups xmmword ptr [rsp+650h],xmm15
Function
12746
has the following encoded unwind information:```
01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00Decoded:
Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Frame pointer register: RBP
Frame pointer offset: 80
Codes:- Prologue offset: 55
Op: SaveXmm128 (note: operation is two unwind codes in length)
Reg: XMM15
Stack offset: 0- Prologue offset: 46
Op: LargeStackAlloc (note: operation is two unwind codes in length)
Size: 668 bytes- Prologue offset: 39
Op: PushNonvolatileRegister
Reg: R15- Prologue offset: 37
Op: PushNonvolatileRegister
Reg: R14- Prologue offset: 35
Op: PushNonvolatileRegister
Reg: R13- Prologue offset: 33
Op: PushNonvolatileRegister
Reg: R12- Prologue offset: 31
Op: PushNonvolatileRegister
Reg: RDI- Prologue offset: 29
Op: PushNonvolatileRegister
Reg: RSI- Prologue offset: 27
Op: PushNonvolatileRegister
Reg: RBX- Prologue offset: 25
Op: SetFramePointer- Prologue offset: 22
Op: PushNonvolatileRegister
Reg: RBP
Alignment padding: 2 bytesThis is correct for the generated prologue. I've verified that the correct offset to the unwind information was registered with Windows for this function. The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address. Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack pointer adjustment). That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit). Still digging. ~~~
peterhuene commented on Issue #1967:
I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g.
mov rbp,rsp
), stack walking fails.The problem boils down to the way the unwind information is describing the frame pointer. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.
However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls
alloca
).The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame".
The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to
0
and theSetFramePointer
unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.
peterhuene edited a comment on Issue #1967:
I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g.
mov rbp,rsp
), stack walking fails.The problem boils down to the way the unwind information is describing the frame pointer. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.
However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls
alloca
).The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g.
lea rbp, [rsp-0x50]
in this case) which we're obviously not doing.The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to
0
and theSetFramePointer
unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.
peterhuene edited a comment on Issue #1967:
I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g.
mov rbp,rsp
), stack walking fails.The problem boils down to the way the unwind information is describing the frame pointer for frames with FPR saves. Wasmtime establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.
However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls
alloca
).The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g.
lea rbp, [rsp-0x50]
in this case) which we're obviously not doing.The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to
0
and theSetFramePointer
unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.
peterhuene edited a comment on Issue #1967:
I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g.
mov rbp,rsp
), stack walking fails.The problem boils down to the way the unwind information is describing the frame pointer for frames with FPR saves. Cranelift establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.
However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls
alloca
).The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g.
lea rbp, [rsp-0x50]
in this case) which we're obviously not doing.The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to
0
and theSetFramePointer
unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.
peterhuene commented on Issue #1967:
I have a fix available, but I need to add a test case where we have a trap with frames that have FPR saves so that we don't regress this.
peterhuene labeled Issue #1967 (assigned to peterhuene):
Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:
$ gunzip yosys.wasm.gz $ cargo run --release -- run yosys.wasm -- -V ... Finished release [optimized] target(s) in 3.84s Running `target\release\wasmtime.exe run yosys.wasm -- -V` error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)
Something funny is going on here though because
--opt-level 0
does not reproduce the issue (--opt-level 1
does though).I'm trying to run this right now through
wasm-reduce
from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?
peterhuene closed Issue #1967 (assigned to peterhuene):
Originally reported at https://github.com/bytecodealliance/wasmtime-py/issues/36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:
$ gunzip yosys.wasm.gz $ cargo run --release -- run yosys.wasm -- -V ... Finished release [optimized] target(s) in 3.84s Running `target\release\wasmtime.exe run yosys.wasm -- -V` error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)
Something funny is going on here though because
--opt-level 0
does not reproduce the issue (--opt-level 1
does though).I'm trying to run this right now through
wasm-reduce
from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?
Last updated: Nov 22 2024 at 17:03 UTC