alexcrichton opened issue #3859:
With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the
windows-latest
image fromwindows-2019
towindows-2022
. The PR to temporarily pin towindows-2019
has landed and CI is fixed for now but we also did some further investigation to figure out what aboutwindows-2022
was causing breakage.With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was
wast::Cranelift::misc::stack_overflow
. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with--test-threads 2
.The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that
Backtrace::new_unresolved
was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.That's all basically a long-winded way of saying that the system-generated stack trace on
windows-2019
was correct while the stack trace onwindows-2022
was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:(module (func call 0) )
the cranelift machine IR emitted for this is:
VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: movq 0(%rdi), %r10 Inst 4: movq 0(%r10), %r10 Inst 5: cmpq %rsp, %r10 Inst 6: jbe ; ud2 stk_ovf ; Inst 7: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 8: movq %rdi, %rax Inst 9: movq %rax, %rdi Inst 10: movq %rax, %rsi Inst 11: call User { namespace: 0, index: 0 } Inst 12: jmp label1 Block 1: (original IR block: block1) (instruction range: 13 .. 16) Inst 13: movq %rbp, %rsp Inst 14: popq %rbp Inst 15: ret }}
When wasm hits the fault at
"Inst 6" above due to stack overflow this is before the
DefineNewFrameunwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of
DefineNewFrameis why the
windows-2022` stack unwinding hits an infinite loop.It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:
0: ip=0x7ff62df77637 sp=0x7c55af8d20 1: ip=0x7ff62df77637 sp=0x7c55af8d20 2: ip=0x7ff62df77527 sp=0x7c55afabd0 3: ip=0x7ff62df54c33 sp=0x7c55afac40 4: ip=0x7ff62df55303 sp=0x7c55afae30 5: ip=0x7ff62df635a7 sp=0x7c55afaea0 6: ip=0x7ff62df55238 sp=0x7c55afaf10 7: ip=0x7fff1896b592 sp=0x7c55afaf60 8: ip=0x7fff18922022 sp=0x7c55afb000 9: ip=0x7fff18992e1e sp=0x7c55afb240 10: ip=0x1d76a8f1013 sp=0x7c55afc010 11: ip=0x7c55afc020 sp=0x7c55afc018 12: ip=0x1d76a8f1023 sp=0x7c55afc020 13: ip=0x7c55afc030 sp=0x7c55afc028 14: ip=0x1d76a8f1023 sp=0x7c55afc030 15: ip=0x7c55afc040 sp=0x7c55afc038 16: ip=0x1d76a8f1023 sp=0x7c55afc040 ... 22446: ip=0x1d76a8f1023 sp=0x7c55b27d30 22447: ip=0x7c55b27d40 sp=0x7c55b27d38 22448: ip=0x1d76a8f1023 sp=0x7c55b27d40 22449: ip=0x7c55b27d50 sp=0x7c55b27d48 22450: ip=0x1d76a8f1023 sp=0x7c55b27d50 22451: ip=0x7c55b27d60 sp=0x7c55b27d58 22452: ip=0x1d76a8f1023 sp=0x7c55b27d60 22453: ip=0x7c55b27d70 sp=0x7c55b27d68 22454: ip=0x1d76a8f1023 sp=0x7c55b27d70 22455: ip=0x7c55b27d80 sp=0x7c55b27d78 22456: ip=0x1d76a8f1023 sp=0x7c55b27d80 22457: ip=0x7c55b27d90 sp=0x7c55b27d88 22458: ip=0x1d76a8f1023 sp=0x7c55b27d90 22459: ip=0x7c55b27da0 sp=0x7c55b27d98
where the
ip
is seemingly correct in that this was a mutally recursive set of functions as part of the test case and thesp
is indeed increasing, but there seems to be no limit tosp
increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.
alexcrichton commented on issue #3859:
cc @cfallin, @peterhuene
cfallin commented on issue #3859:
Some notes here: I've been paging in details on the unwind generation and ABI code and this will require a bit of surgery to fix I think. In particular:
- The
DefineNewFrame
unwind pseudo-inst (defined here) has info about clobber-frame size.- This means that the generation of this pseudo-inst here (for x64, but similarly for others) needs to occur in the clobber-saving trait method, which is invoked after the initial prologue (push rbp / mov rbp, rsp).
- The info about clobbers is currently needed because in the translation both to winx64 unwind and systemv unwind we save info to be able to generate the correct offsets with subsequent clobber saves.
I think the simplest approach is probably to split the function of
DefineNewFrame
into two parts: the actual setting of the FP register, and the metadata used to generate FP- or SP-relative offsets for clobber saves. I'll see what I can do about that.
cfallin commented on issue #3859:
Hmm, and as an added twist, when generating the SetFPReg Windows unwind record we need to know the size of the clobbers, so we actually can't do the split I suggested above; we need to process clobbers (at least know their size) before we generate the unwind info for setting the FP, which itself happens before the stack check (and must be so, as Windows requires a fairly rigid prologue sequence). So the way in which we handle stack check generation has to change overall. Will consider more!
alexcrichton labeled issue #3859:
With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the
windows-latest
image fromwindows-2019
towindows-2022
. The PR to temporarily pin towindows-2019
has landed and CI is fixed for now but we also did some further investigation to figure out what aboutwindows-2022
was causing breakage.With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was
wast::Cranelift::misc::stack_overflow
. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with--test-threads 2
.The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that
Backtrace::new_unresolved
was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.That's all basically a long-winded way of saying that the system-generated stack trace on
windows-2019
was correct while the stack trace onwindows-2022
was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:(module (func call 0) )
the cranelift machine IR emitted for this is:
VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: movq 0(%rdi), %r10 Inst 4: movq 0(%r10), %r10 Inst 5: cmpq %rsp, %r10 Inst 6: jbe ; ud2 stk_ovf ; Inst 7: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 8: movq %rdi, %rax Inst 9: movq %rax, %rdi Inst 10: movq %rax, %rsi Inst 11: call User { namespace: 0, index: 0 } Inst 12: jmp label1 Block 1: (original IR block: block1) (instruction range: 13 .. 16) Inst 13: movq %rbp, %rsp Inst 14: popq %rbp Inst 15: ret }}
When wasm hits the fault at
"Inst 6" above due to stack overflow this is before the
DefineNewFrameunwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of
DefineNewFrameis why the
windows-2022` stack unwinding hits an infinite loop.It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:
0: ip=0x7ff62df77637 sp=0x7c55af8d20 1: ip=0x7ff62df77637 sp=0x7c55af8d20 2: ip=0x7ff62df77527 sp=0x7c55afabd0 3: ip=0x7ff62df54c33 sp=0x7c55afac40 4: ip=0x7ff62df55303 sp=0x7c55afae30 5: ip=0x7ff62df635a7 sp=0x7c55afaea0 6: ip=0x7ff62df55238 sp=0x7c55afaf10 7: ip=0x7fff1896b592 sp=0x7c55afaf60 8: ip=0x7fff18922022 sp=0x7c55afb000 9: ip=0x7fff18992e1e sp=0x7c55afb240 10: ip=0x1d76a8f1013 sp=0x7c55afc010 11: ip=0x7c55afc020 sp=0x7c55afc018 12: ip=0x1d76a8f1023 sp=0x7c55afc020 13: ip=0x7c55afc030 sp=0x7c55afc028 14: ip=0x1d76a8f1023 sp=0x7c55afc030 15: ip=0x7c55afc040 sp=0x7c55afc038 16: ip=0x1d76a8f1023 sp=0x7c55afc040 ... 22446: ip=0x1d76a8f1023 sp=0x7c55b27d30 22447: ip=0x7c55b27d40 sp=0x7c55b27d38 22448: ip=0x1d76a8f1023 sp=0x7c55b27d40 22449: ip=0x7c55b27d50 sp=0x7c55b27d48 22450: ip=0x1d76a8f1023 sp=0x7c55b27d50 22451: ip=0x7c55b27d60 sp=0x7c55b27d58 22452: ip=0x1d76a8f1023 sp=0x7c55b27d60 22453: ip=0x7c55b27d70 sp=0x7c55b27d68 22454: ip=0x1d76a8f1023 sp=0x7c55b27d70 22455: ip=0x7c55b27d80 sp=0x7c55b27d78 22456: ip=0x1d76a8f1023 sp=0x7c55b27d80 22457: ip=0x7c55b27d90 sp=0x7c55b27d88 22458: ip=0x1d76a8f1023 sp=0x7c55b27d90 22459: ip=0x7c55b27da0 sp=0x7c55b27d98
where the
ip
is seemingly correct in that this was a mutally recursive set of functions as part of the test case and thesp
is indeed increasing, but there seems to be no limit tosp
increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.
alexcrichton labeled issue #3859:
With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the
windows-latest
image fromwindows-2019
towindows-2022
. The PR to temporarily pin towindows-2019
has landed and CI is fixed for now but we also did some further investigation to figure out what aboutwindows-2022
was causing breakage.With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was
wast::Cranelift::misc::stack_overflow
. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with--test-threads 2
.The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that
Backtrace::new_unresolved
was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.That's all basically a long-winded way of saying that the system-generated stack trace on
windows-2019
was correct while the stack trace onwindows-2022
was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:(module (func call 0) )
the cranelift machine IR emitted for this is:
VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: movq 0(%rdi), %r10 Inst 4: movq 0(%r10), %r10 Inst 5: cmpq %rsp, %r10 Inst 6: jbe ; ud2 stk_ovf ; Inst 7: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 8: movq %rdi, %rax Inst 9: movq %rax, %rdi Inst 10: movq %rax, %rsi Inst 11: call User { namespace: 0, index: 0 } Inst 12: jmp label1 Block 1: (original IR block: block1) (instruction range: 13 .. 16) Inst 13: movq %rbp, %rsp Inst 14: popq %rbp Inst 15: ret }}
When wasm hits the fault at
"Inst 6" above due to stack overflow this is before the
DefineNewFrameunwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of
DefineNewFrameis why the
windows-2022` stack unwinding hits an infinite loop.It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:
0: ip=0x7ff62df77637 sp=0x7c55af8d20 1: ip=0x7ff62df77637 sp=0x7c55af8d20 2: ip=0x7ff62df77527 sp=0x7c55afabd0 3: ip=0x7ff62df54c33 sp=0x7c55afac40 4: ip=0x7ff62df55303 sp=0x7c55afae30 5: ip=0x7ff62df635a7 sp=0x7c55afaea0 6: ip=0x7ff62df55238 sp=0x7c55afaf10 7: ip=0x7fff1896b592 sp=0x7c55afaf60 8: ip=0x7fff18922022 sp=0x7c55afb000 9: ip=0x7fff18992e1e sp=0x7c55afb240 10: ip=0x1d76a8f1013 sp=0x7c55afc010 11: ip=0x7c55afc020 sp=0x7c55afc018 12: ip=0x1d76a8f1023 sp=0x7c55afc020 13: ip=0x7c55afc030 sp=0x7c55afc028 14: ip=0x1d76a8f1023 sp=0x7c55afc030 15: ip=0x7c55afc040 sp=0x7c55afc038 16: ip=0x1d76a8f1023 sp=0x7c55afc040 ... 22446: ip=0x1d76a8f1023 sp=0x7c55b27d30 22447: ip=0x7c55b27d40 sp=0x7c55b27d38 22448: ip=0x1d76a8f1023 sp=0x7c55b27d40 22449: ip=0x7c55b27d50 sp=0x7c55b27d48 22450: ip=0x1d76a8f1023 sp=0x7c55b27d50 22451: ip=0x7c55b27d60 sp=0x7c55b27d58 22452: ip=0x1d76a8f1023 sp=0x7c55b27d60 22453: ip=0x7c55b27d70 sp=0x7c55b27d68 22454: ip=0x1d76a8f1023 sp=0x7c55b27d70 22455: ip=0x7c55b27d80 sp=0x7c55b27d78 22456: ip=0x1d76a8f1023 sp=0x7c55b27d80 22457: ip=0x7c55b27d90 sp=0x7c55b27d88 22458: ip=0x1d76a8f1023 sp=0x7c55b27d90 22459: ip=0x7c55b27da0 sp=0x7c55b27d98
where the
ip
is seemingly correct in that this was a mutally recursive set of functions as part of the test case and thesp
is indeed increasing, but there seems to be no limit tosp
increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.
alexcrichton edited issue #3859:
With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the
windows-latest
image fromwindows-2019
towindows-2022
. The PR to temporarily pin towindows-2019
has landed and CI is fixed for now but we also did some further investigation to figure out what aboutwindows-2022
was causing breakage.With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was
wast::Cranelift::misc::stack_overflow
. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with--test-threads 2
.The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that
Backtrace::new_unresolved
was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.That's all basically a long-winded way of saying that the system-generated stack trace on
windows-2019
was correct while the stack trace onwindows-2022
was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:(module (func call 0) )
the cranelift machine IR emitted for this is:
VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: movq 0(%rdi), %r10 Inst 4: movq 0(%r10), %r10 Inst 5: cmpq %rsp, %r10 Inst 6: jbe ; ud2 stk_ovf ; Inst 7: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 8: movq %rdi, %rax Inst 9: movq %rax, %rdi Inst 10: movq %rax, %rsi Inst 11: call User { namespace: 0, index: 0 } Inst 12: jmp label1 Block 1: (original IR block: block1) (instruction range: 13 .. 16) Inst 13: movq %rbp, %rsp Inst 14: popq %rbp Inst 15: ret }}
When wasm hits the fault at
"Inst 6"
above due to stack overflow this is before theDefineNewFrame
unwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location ofDefineNewFrame
is why thewindows-2022
stack unwinding hits an infinite loop.It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:
0: ip=0x7ff62df77637 sp=0x7c55af8d20 1: ip=0x7ff62df77637 sp=0x7c55af8d20 2: ip=0x7ff62df77527 sp=0x7c55afabd0 3: ip=0x7ff62df54c33 sp=0x7c55afac40 4: ip=0x7ff62df55303 sp=0x7c55afae30 5: ip=0x7ff62df635a7 sp=0x7c55afaea0 6: ip=0x7ff62df55238 sp=0x7c55afaf10 7: ip=0x7fff1896b592 sp=0x7c55afaf60 8: ip=0x7fff18922022 sp=0x7c55afb000 9: ip=0x7fff18992e1e sp=0x7c55afb240 10: ip=0x1d76a8f1013 sp=0x7c55afc010 11: ip=0x7c55afc020 sp=0x7c55afc018 12: ip=0x1d76a8f1023 sp=0x7c55afc020 13: ip=0x7c55afc030 sp=0x7c55afc028 14: ip=0x1d76a8f1023 sp=0x7c55afc030 15: ip=0x7c55afc040 sp=0x7c55afc038 16: ip=0x1d76a8f1023 sp=0x7c55afc040 ... 22446: ip=0x1d76a8f1023 sp=0x7c55b27d30 22447: ip=0x7c55b27d40 sp=0x7c55b27d38 22448: ip=0x1d76a8f1023 sp=0x7c55b27d40 22449: ip=0x7c55b27d50 sp=0x7c55b27d48 22450: ip=0x1d76a8f1023 sp=0x7c55b27d50 22451: ip=0x7c55b27d60 sp=0x7c55b27d58 22452: ip=0x1d76a8f1023 sp=0x7c55b27d60 22453: ip=0x7c55b27d70 sp=0x7c55b27d68 22454: ip=0x1d76a8f1023 sp=0x7c55b27d70 22455: ip=0x7c55b27d80 sp=0x7c55b27d78 22456: ip=0x1d76a8f1023 sp=0x7c55b27d80 22457: ip=0x7c55b27d90 sp=0x7c55b27d88 22458: ip=0x1d76a8f1023 sp=0x7c55b27d90 22459: ip=0x7c55b27da0 sp=0x7c55b27d98
where the
ip
is seemingly correct in that this was a mutally recursive set of functions as part of the test case and thesp
is indeed increasing, but there seems to be no limit tosp
increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.
Last updated: Nov 22 2024 at 16:03 UTC