Stream: git-wasmtime

Topic: wasmtime / issue #3859 Unwind info for Windows may not be...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2022 at 15:11):

alexcrichton opened issue #3859:

With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the windows-latest image from windows-2019 to windows-2022. The PR to temporarily pin to windows-2019 has landed and CI is fixed for now but we also did some further investigation to figure out what about windows-2022 was causing breakage.

With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was wast::Cranelift::misc::stack_overflow. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with --test-threads 2.

The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that Backtrace::new_unresolved was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.

That's all basically a long-winded way of saying that the system-generated stack trace on windows-2019 was correct while the stack trace on windows-2022 was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:

(module
  (func call 0)
)

the cranelift machine IR emitted for this is:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (successor: Block 1)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   movq    0(%rdi), %r10
  Inst 4:   movq    0(%r10), %r10
  Inst 5:   cmpq    %rsp, %r10
  Inst 6:   jbe ; ud2 stk_ovf ;
  Inst 7:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 8:   movq    %rdi, %rax
  Inst 9:   movq    %rax, %rdi
  Inst 10:   movq    %rax, %rsi
  Inst 11:   call    User { namespace: 0, index: 0 }
  Inst 12:   jmp     label1
Block 1:
  (original IR block: block1)
  (instruction range: 13 .. 16)
  Inst 13:   movq    %rbp, %rsp
  Inst 14:   popq    %rbp
  Inst 15:   ret
}}

When wasm hits the fault at "Inst 6" above due to stack overflow this is before the DefineNewFrame unwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of DefineNewFrame is why the windows-2022` stack unwinding hits an infinite loop.

It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:

0: ip=0x7ff62df77637 sp=0x7c55af8d20
1: ip=0x7ff62df77637 sp=0x7c55af8d20
2: ip=0x7ff62df77527 sp=0x7c55afabd0
3: ip=0x7ff62df54c33 sp=0x7c55afac40
4: ip=0x7ff62df55303 sp=0x7c55afae30
5: ip=0x7ff62df635a7 sp=0x7c55afaea0
6: ip=0x7ff62df55238 sp=0x7c55afaf10
7: ip=0x7fff1896b592 sp=0x7c55afaf60
8: ip=0x7fff18922022 sp=0x7c55afb000
9: ip=0x7fff18992e1e sp=0x7c55afb240
10: ip=0x1d76a8f1013 sp=0x7c55afc010
11: ip=0x7c55afc020 sp=0x7c55afc018
12: ip=0x1d76a8f1023 sp=0x7c55afc020
13: ip=0x7c55afc030 sp=0x7c55afc028
14: ip=0x1d76a8f1023 sp=0x7c55afc030
15: ip=0x7c55afc040 sp=0x7c55afc038
16: ip=0x1d76a8f1023 sp=0x7c55afc040
...
22446: ip=0x1d76a8f1023 sp=0x7c55b27d30
22447: ip=0x7c55b27d40 sp=0x7c55b27d38
22448: ip=0x1d76a8f1023 sp=0x7c55b27d40
22449: ip=0x7c55b27d50 sp=0x7c55b27d48
22450: ip=0x1d76a8f1023 sp=0x7c55b27d50
22451: ip=0x7c55b27d60 sp=0x7c55b27d58
22452: ip=0x1d76a8f1023 sp=0x7c55b27d60
22453: ip=0x7c55b27d70 sp=0x7c55b27d68
22454: ip=0x1d76a8f1023 sp=0x7c55b27d70
22455: ip=0x7c55b27d80 sp=0x7c55b27d78
22456: ip=0x1d76a8f1023 sp=0x7c55b27d80
22457: ip=0x7c55b27d90 sp=0x7c55b27d88
22458: ip=0x1d76a8f1023 sp=0x7c55b27d90
22459: ip=0x7c55b27da0 sp=0x7c55b27d98

where the ip is seemingly correct in that this was a mutally recursive set of functions as part of the test case and the sp is indeed increasing, but there seems to be no limit to sp increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!

In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2022 at 16:12):

alexcrichton commented on issue #3859:

cc @cfallin, @peterhuene

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2022 at 18:54):

cfallin commented on issue #3859:

Some notes here: I've been paging in details on the unwind generation and ABI code and this will require a bit of surgery to fix I think. In particular:

I think the simplest approach is probably to split the function of DefineNewFrame into two parts: the actual setting of the FP register, and the metadata used to generate FP- or SP-relative offsets for clobber saves. I'll see what I can do about that.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2022 at 19:09):

cfallin commented on issue #3859:

Hmm, and as an added twist, when generating the SetFPReg Windows unwind record we need to know the size of the clobbers, so we actually can't do the split I suggested above; we need to process clobbers (at least know their size) before we generate the unwind info for setting the FP, which itself happens before the stack check (and must be so, as Windows requires a fairly rigid prologue sequence). So the way in which we handle stack check generation has to change overall. Will consider more!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:12):

alexcrichton labeled issue #3859:

With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the windows-latest image from windows-2019 to windows-2022. The PR to temporarily pin to windows-2019 has landed and CI is fixed for now but we also did some further investigation to figure out what about windows-2022 was causing breakage.

With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was wast::Cranelift::misc::stack_overflow. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with --test-threads 2.

The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that Backtrace::new_unresolved was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.

That's all basically a long-winded way of saying that the system-generated stack trace on windows-2019 was correct while the stack trace on windows-2022 was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:

(module
  (func call 0)
)

the cranelift machine IR emitted for this is:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (successor: Block 1)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   movq    0(%rdi), %r10
  Inst 4:   movq    0(%r10), %r10
  Inst 5:   cmpq    %rsp, %r10
  Inst 6:   jbe ; ud2 stk_ovf ;
  Inst 7:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 8:   movq    %rdi, %rax
  Inst 9:   movq    %rax, %rdi
  Inst 10:   movq    %rax, %rsi
  Inst 11:   call    User { namespace: 0, index: 0 }
  Inst 12:   jmp     label1
Block 1:
  (original IR block: block1)
  (instruction range: 13 .. 16)
  Inst 13:   movq    %rbp, %rsp
  Inst 14:   popq    %rbp
  Inst 15:   ret
}}

When wasm hits the fault at "Inst 6" above due to stack overflow this is before the DefineNewFrame unwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of DefineNewFrame is why the windows-2022` stack unwinding hits an infinite loop.

It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:

0: ip=0x7ff62df77637 sp=0x7c55af8d20
1: ip=0x7ff62df77637 sp=0x7c55af8d20
2: ip=0x7ff62df77527 sp=0x7c55afabd0
3: ip=0x7ff62df54c33 sp=0x7c55afac40
4: ip=0x7ff62df55303 sp=0x7c55afae30
5: ip=0x7ff62df635a7 sp=0x7c55afaea0
6: ip=0x7ff62df55238 sp=0x7c55afaf10
7: ip=0x7fff1896b592 sp=0x7c55afaf60
8: ip=0x7fff18922022 sp=0x7c55afb000
9: ip=0x7fff18992e1e sp=0x7c55afb240
10: ip=0x1d76a8f1013 sp=0x7c55afc010
11: ip=0x7c55afc020 sp=0x7c55afc018
12: ip=0x1d76a8f1023 sp=0x7c55afc020
13: ip=0x7c55afc030 sp=0x7c55afc028
14: ip=0x1d76a8f1023 sp=0x7c55afc030
15: ip=0x7c55afc040 sp=0x7c55afc038
16: ip=0x1d76a8f1023 sp=0x7c55afc040
...
22446: ip=0x1d76a8f1023 sp=0x7c55b27d30
22447: ip=0x7c55b27d40 sp=0x7c55b27d38
22448: ip=0x1d76a8f1023 sp=0x7c55b27d40
22449: ip=0x7c55b27d50 sp=0x7c55b27d48
22450: ip=0x1d76a8f1023 sp=0x7c55b27d50
22451: ip=0x7c55b27d60 sp=0x7c55b27d58
22452: ip=0x1d76a8f1023 sp=0x7c55b27d60
22453: ip=0x7c55b27d70 sp=0x7c55b27d68
22454: ip=0x1d76a8f1023 sp=0x7c55b27d70
22455: ip=0x7c55b27d80 sp=0x7c55b27d78
22456: ip=0x1d76a8f1023 sp=0x7c55b27d80
22457: ip=0x7c55b27d90 sp=0x7c55b27d88
22458: ip=0x1d76a8f1023 sp=0x7c55b27d90
22459: ip=0x7c55b27da0 sp=0x7c55b27d98

where the ip is seemingly correct in that this was a mutally recursive set of functions as part of the test case and the sp is indeed increasing, but there seems to be no limit to sp increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!

In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:12):

alexcrichton labeled issue #3859:

With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the windows-latest image from windows-2019 to windows-2022. The PR to temporarily pin to windows-2019 has landed and CI is fixed for now but we also did some further investigation to figure out what about windows-2022 was causing breakage.

With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was wast::Cranelift::misc::stack_overflow. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with --test-threads 2.

The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that Backtrace::new_unresolved was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.

That's all basically a long-winded way of saying that the system-generated stack trace on windows-2019 was correct while the stack trace on windows-2022 was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:

(module
  (func call 0)
)

the cranelift machine IR emitted for this is:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (successor: Block 1)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   movq    0(%rdi), %r10
  Inst 4:   movq    0(%r10), %r10
  Inst 5:   cmpq    %rsp, %r10
  Inst 6:   jbe ; ud2 stk_ovf ;
  Inst 7:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 8:   movq    %rdi, %rax
  Inst 9:   movq    %rax, %rdi
  Inst 10:   movq    %rax, %rsi
  Inst 11:   call    User { namespace: 0, index: 0 }
  Inst 12:   jmp     label1
Block 1:
  (original IR block: block1)
  (instruction range: 13 .. 16)
  Inst 13:   movq    %rbp, %rsp
  Inst 14:   popq    %rbp
  Inst 15:   ret
}}

When wasm hits the fault at "Inst 6" above due to stack overflow this is before the DefineNewFrame unwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of DefineNewFrame is why the windows-2022` stack unwinding hits an infinite loop.

It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:

0: ip=0x7ff62df77637 sp=0x7c55af8d20
1: ip=0x7ff62df77637 sp=0x7c55af8d20
2: ip=0x7ff62df77527 sp=0x7c55afabd0
3: ip=0x7ff62df54c33 sp=0x7c55afac40
4: ip=0x7ff62df55303 sp=0x7c55afae30
5: ip=0x7ff62df635a7 sp=0x7c55afaea0
6: ip=0x7ff62df55238 sp=0x7c55afaf10
7: ip=0x7fff1896b592 sp=0x7c55afaf60
8: ip=0x7fff18922022 sp=0x7c55afb000
9: ip=0x7fff18992e1e sp=0x7c55afb240
10: ip=0x1d76a8f1013 sp=0x7c55afc010
11: ip=0x7c55afc020 sp=0x7c55afc018
12: ip=0x1d76a8f1023 sp=0x7c55afc020
13: ip=0x7c55afc030 sp=0x7c55afc028
14: ip=0x1d76a8f1023 sp=0x7c55afc030
15: ip=0x7c55afc040 sp=0x7c55afc038
16: ip=0x1d76a8f1023 sp=0x7c55afc040
...
22446: ip=0x1d76a8f1023 sp=0x7c55b27d30
22447: ip=0x7c55b27d40 sp=0x7c55b27d38
22448: ip=0x1d76a8f1023 sp=0x7c55b27d40
22449: ip=0x7c55b27d50 sp=0x7c55b27d48
22450: ip=0x1d76a8f1023 sp=0x7c55b27d50
22451: ip=0x7c55b27d60 sp=0x7c55b27d58
22452: ip=0x1d76a8f1023 sp=0x7c55b27d60
22453: ip=0x7c55b27d70 sp=0x7c55b27d68
22454: ip=0x1d76a8f1023 sp=0x7c55b27d70
22455: ip=0x7c55b27d80 sp=0x7c55b27d78
22456: ip=0x1d76a8f1023 sp=0x7c55b27d80
22457: ip=0x7c55b27d90 sp=0x7c55b27d88
22458: ip=0x1d76a8f1023 sp=0x7c55b27d90
22459: ip=0x7c55b27da0 sp=0x7c55b27d98

where the ip is seemingly correct in that this was a mutally recursive set of functions as part of the test case and the sp is indeed increasing, but there seems to be no limit to sp increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!

In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2022 at 16:18):

alexcrichton edited issue #3859:

With a debugging session last Friday about Windows-specific CI failures we ended up concluding that CI failures were tied to Github Actions' recent upgrade of the windows-latest image from windows-2019 to windows-2022. The PR to temporarily pin to windows-2019 has landed and CI is fixed for now but we also did some further investigation to figure out what about windows-2022 was causing breakage.

With the investigation on https://github.com/bytecodealliance/wasmtime/pull/3853 the conclusion was that one of the failing tests was wast::Cranelift::misc::stack_overflow. Some further investigation showed that all stack overflow tests were failing. Digging further into this our first conclusion was https://github.com/bytecodealliance/wasmtime/issues/3857 which is separate from this issue, but we realized that reproducing this issue required running tests on separate threads, at least with --test-threads 2.

The previous CI failures we were seeing were out-of-memory errors on the tune of 5GiB allocations. It turns out that Backtrace::new_unresolved was the culprit as the stack trace was hitting an "infinite loop" where the stack wasn't actually infinite but the stack unwinder was infinitely looping over frames. We attempted to collect (https://github.com/bytecodealliance/wasmtime/issues/3858) all these frames which resulted in OOM.

That's all basically a long-winded way of saying that the system-generated stack trace on windows-2019 was correct while the stack trace on windows-2022 was an "infinite" stack trace (despite it not actually being infinite). For a simple WebAssembly module:

(module
  (func call 0)
)

the cranelift machine IR emitted for this is:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (successor: Block 1)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   movq    0(%rdi), %r10
  Inst 4:   movq    0(%r10), %r10
  Inst 5:   cmpq    %rsp, %r10
  Inst 6:   jbe ; ud2 stk_ovf ;
  Inst 7:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 8:   movq    %rdi, %rax
  Inst 9:   movq    %rax, %rdi
  Inst 10:   movq    %rax, %rsi
  Inst 11:   call    User { namespace: 0, index: 0 }
  Inst 12:   jmp     label1
Block 1:
  (original IR block: block1)
  (instruction range: 13 .. 16)
  Inst 13:   movq    %rbp, %rsp
  Inst 14:   popq    %rbp
  Inst 15:   ret
}}

When wasm hits the fault at "Inst 6" above due to stack overflow this is before the DefineNewFrame unwind pseudo-instruction, but that instruction should probably be just after "Inst 2" instead of at the end of the prologue that checks the out-of-stack condition. The current suspicion is that this probably-incorrect location of DefineNewFrame is why the windows-2022 stack unwinding hits an infinite loop.

It's worth noting that one of the debugging runs on https://github.com/bytecodealliance/wasmtime/pull/3853 printed out the ip/sp of each frame on the stack and it looked like:

0: ip=0x7ff62df77637 sp=0x7c55af8d20
1: ip=0x7ff62df77637 sp=0x7c55af8d20
2: ip=0x7ff62df77527 sp=0x7c55afabd0
3: ip=0x7ff62df54c33 sp=0x7c55afac40
4: ip=0x7ff62df55303 sp=0x7c55afae30
5: ip=0x7ff62df635a7 sp=0x7c55afaea0
6: ip=0x7ff62df55238 sp=0x7c55afaf10
7: ip=0x7fff1896b592 sp=0x7c55afaf60
8: ip=0x7fff18922022 sp=0x7c55afb000
9: ip=0x7fff18992e1e sp=0x7c55afb240
10: ip=0x1d76a8f1013 sp=0x7c55afc010
11: ip=0x7c55afc020 sp=0x7c55afc018
12: ip=0x1d76a8f1023 sp=0x7c55afc020
13: ip=0x7c55afc030 sp=0x7c55afc028
14: ip=0x1d76a8f1023 sp=0x7c55afc030
15: ip=0x7c55afc040 sp=0x7c55afc038
16: ip=0x1d76a8f1023 sp=0x7c55afc040
...
22446: ip=0x1d76a8f1023 sp=0x7c55b27d30
22447: ip=0x7c55b27d40 sp=0x7c55b27d38
22448: ip=0x1d76a8f1023 sp=0x7c55b27d40
22449: ip=0x7c55b27d50 sp=0x7c55b27d48
22450: ip=0x1d76a8f1023 sp=0x7c55b27d50
22451: ip=0x7c55b27d60 sp=0x7c55b27d58
22452: ip=0x1d76a8f1023 sp=0x7c55b27d60
22453: ip=0x7c55b27d70 sp=0x7c55b27d68
22454: ip=0x1d76a8f1023 sp=0x7c55b27d70
22455: ip=0x7c55b27d80 sp=0x7c55b27d78
22456: ip=0x1d76a8f1023 sp=0x7c55b27d80
22457: ip=0x7c55b27d90 sp=0x7c55b27d88
22458: ip=0x1d76a8f1023 sp=0x7c55b27d90
22459: ip=0x7c55b27da0 sp=0x7c55b27d98

where the ip is seemingly correct in that this was a mutally recursive set of functions as part of the test case and the sp is indeed increasing, but there seems to be no limit to sp increasing and apparently no reads/writes of memory are being done because presumably it would have otherwise segfaulted at this point!

In any case it appears that bad unwinding information is to blame here on Windows, so this issue is intended to track fixing that, and a fix for this issue should likely be accompanied with a revert of https://github.com/bytecodealliance/wasmtime/pull/3854 as well.


Last updated: Nov 22 2024 at 16:03 UTC