Stream: git-wasmtime

Topic: wasmtime / issue #9900 [DWARF] Incorrect location descriptor


view this post on Zulip Wasmtime GitHub notifications bot (Dec 24 2024 at 18:18):

SingleAccretion opened issue #9900:

Reproduction steps:
1) Check out #9898.
2) Modify the added test to stop passing -Oopt-level=0 to wasmtime.
3) Run the test.

Expected result: the test passes or fails with 'expression unavailable'.

Actual result:

error: Execution was interrupted, reason: Exception 0xc0000005 encountered at address 0x7ff6fcf15f90: Access violation reading location 0x1ae198763ec.
The process has been returned to the state before expression evaluation.


* thread #1, name = 'main', stop reason = breakpoint 1.1
    frame #0: 0x0000028d720c15b8 JIT(0x28d72210080)`DerivedType::InstanceMethod(__this=(__ptr = <read memory from 0x28de65f5f0c failed (0 of 4 bytes read)>)) at generic.cpp:26:5
   23     long long DerivedValue = 2;
   24
   25     int InstanceMethod() {
-> 26       debug_break();
   27       return BaseValue + DerivedValue;
   28     }
   29
(lldb) v
(WasmtimeVMContext *) __vmctx = 0x0000028d72165f00
(WebAssemblyPtrWrapper<DerivedType>) __this = (__ptr = <read memory from 0x28de65f5f0c failed (0 of 4 bytes read)>)

It means the location descriptor for __this is incorrect.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 24 2024 at 20:24):

SingleAccretion commented on issue #9900:

This is an interesting one. Here's how to reproducing the failure:

> lldb.exe -o "b InstanceMethod" -o "r" -- wasmtime.exe -Ccache=n -Ddebug-info generic.wasm

   25     int InstanceMethod() {
-> 26       debug_break();
   27       return BaseValue + DerivedValue;
   28     }
   29
(lldb) v
(WasmtimeVMContext *) __vmctx = 0x000001dc1b2b2e10
(WebAssemblyPtrWrapper<DerivedType>) this = (__ptr = <read memory from 0x1dc38812e1c failed (0 of 4 bytes read)>)

Here's the expression for this/__this:

Variable: id = {0x0000048d}, name = "this", type = "WebAssemblyPtrWrapper<DerivedType>", valid ranges = <block>,
          location = DW_OP_breg4 RSI+0, DW_OP_plus_uconst 0xc, DW_OP_breg5 RDI+96, DW_OP_deref, DW_OP_swap, DW_OP_const4u 0xffffffff, DW_OP_and, DW_OP_plus, decl =
          {RSI + 0xc, [RDI + 96]} ==> {[RDI + 96], RSI + 0xc} ==> [RDI + 96] + (RSI + 0xc) & 0xffffffff
; [RDI + 96] is __vmctx->memory
; RSI is the linear memory frame pointer (WASM level frame base, sans offset)

Here's annotated disassembly:

    0x262b49d1520 <+0>:   push   rbp
    0x262b49d1521 <+1>:   mov    rbp, rsp
    0x262b49d1524 <+4>:   mov    r10, qword ptr [rdi + 0x8]
    0x262b49d1528 <+8>:   mov    r10, qword ptr [r10 + 0x10]
    0x262b49d152c <+12>:  add    r10, 0x30
    0x262b49d1530 <+16>:  cmp    r10, rsp
    0x262b49d1533 <+19>:  ja     0x262b49d15ae  ; <+142> at generic.cpp:27:5
    0x262b49d1539 <+25>:  sub    rsp, 0x20
    0x262b49d153d <+29>:  mov    qword ptr [rsp], rbx
    0x262b49d1541 <+33>:  mov    qword ptr [rsp + 0x8], r12
    0x262b49d1546 <+38>:  mov    qword ptr [rsp + 0x10], r13
    0x262b49d154b <+43>:  mov    qword ptr [rsp + 0x18], r15
    0x262b49d1550 <+48>:  mov    r15d, dword ptr [rdi + 0xa0] ; r15 = __vmctx->__stack_pointer
    0x262b49d1557 <+55>:  mov    rsi, r15                     ; rsi = r15 (__stack_pointer)
    0x262b49d155a <+58>:  sub    esi, 0x10                    ; rsi -= 16
    0x262b49d155d <+61>:  mov    dword ptr [rdi + 0xa0], esi  ; __vmctx->__stack_pointer = rsi
    0x262b49d1563 <+67>:  mov    r12, qword ptr [rdi + 0x60]
    0x262b49d1567 <+71>:  mov    dword ptr [r12 + rsi + 0xc], edx
    0x262b49d156c <+76>:  mov    rbx, rdi
    0x262b49d156f <+79>:  mov    r13, rdx
    0x262b49d1572 <+82>:  mov    rsi, rbx ; rsi = __vmctx
->  0x262b49d1575 <+85>:  call   0x262b49d1180  ; debug_break at generic.h:9:28
    0x262b49d157a <+90>:  mov    rdx, r13
    0x262b49d157d <+93>:  mov    edx, edx
    0x262b49d157f <+95>:  mov    ecx, dword ptr [r12 + rdx]
    0x262b49d1583 <+99>:  mov    rdx, qword ptr [r12 + rdx + 0x8]
    0x262b49d1588 <+104>: mov    dword ptr [rbx + 0xa0], r15d ; __vmctx->__stack_pointer = r15
    0x262b49d158f <+111>: lea    eax, [rcx + rdx]
    0x262b49d1592 <+114>: mov    rbx, qword ptr [rsp]
    0x262b49d1596 <+118>: mov    r12, qword ptr [rsp + 0x8]
    0x262b49d159b <+123>: mov    r13, qword ptr [rsp + 0x10]
    0x262b49d15a0 <+128>: mov    r15, qword ptr [rsp + 0x18]
    0x262b49d15a5 <+133>: add    rsp, 0x20
    0x262b49d15a9 <+137>: mov    rsp, rbp
    0x262b49d15ac <+140>: pop    rbp
    0x262b49d15ad <+141>: ret
    0x262b49d15ae <+142>: ud2

So the culprit here is the fact rsi, which initially does indeed contain the original WASM frame base (local 0), from which everything else gets evaluated, gets overwritten just before the call we've stopped on, and this is either not reflected in the frame info, or is not accounted for correctly.

BTW, here we can also see why the WASM frame base is so prone to being "optimized out" even as it is always used at the end of the method. Local zero gets split into two distinct values and the original, not referenced by anything DI-related, gets allocated into r15.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 24 2024 at 20:26):

SingleAccretion edited a comment on issue #9900:

This is an interesting one. Here's how to reproducing the failure:

> lldb.exe -o "b InstanceMethod" -o "r" -- wasmtime.exe -Ccache=n -Ddebug-info generic.wasm

   25     int InstanceMethod() {
-> 26       debug_break();
   27       return BaseValue + DerivedValue;
   28     }
   29
(lldb) v
(WasmtimeVMContext *) __vmctx = 0x000001dc1b2b2e10
(WebAssemblyPtrWrapper<DerivedType>) this = (__ptr = <read memory from 0x1dc38812e1c failed (0 of 4 bytes read)>)

Here's the expression for this/__this:

Variable: id = {0x0000048d}, name = "this", type = "WebAssemblyPtrWrapper<DerivedType>", valid ranges = <block>,
          location = DW_OP_breg4 RSI+0, DW_OP_plus_uconst 0xc, DW_OP_breg5 RDI+96, DW_OP_deref, DW_OP_swap, DW_OP_const4u 0xffffffff, DW_OP_and, DW_OP_plus, decl =
          {RSI + 0xc, [RDI + 96]} ==> {[RDI + 96], RSI + 0xc} ==> [RDI + 96] + (RSI + 0xc) & 0xffffffff
; [RDI + 96] is __vmctx->memory
; RSI is the linear memory frame pointer (WASM level frame base, sans offset)

Here's annotated disassembly:

    0x262b49d1520 <+0>:   push   rbp
    0x262b49d1521 <+1>:   mov    rbp, rsp
    0x262b49d1524 <+4>:   mov    r10, qword ptr [rdi + 0x8]
    0x262b49d1528 <+8>:   mov    r10, qword ptr [r10 + 0x10]
    0x262b49d152c <+12>:  add    r10, 0x30
    0x262b49d1530 <+16>:  cmp    r10, rsp
    0x262b49d1533 <+19>:  ja     0x262b49d15ae  ; <+142> at generic.cpp:27:5
    0x262b49d1539 <+25>:  sub    rsp, 0x20
    0x262b49d153d <+29>:  mov    qword ptr [rsp], rbx
    0x262b49d1541 <+33>:  mov    qword ptr [rsp + 0x8], r12
    0x262b49d1546 <+38>:  mov    qword ptr [rsp + 0x10], r13
    0x262b49d154b <+43>:  mov    qword ptr [rsp + 0x18], r15
    0x262b49d1550 <+48>:  mov    r15d, dword ptr [rdi + 0xa0] ; r15 = __vmctx->__stack_pointer
    0x262b49d1557 <+55>:  mov    rsi, r15                     ; rsi = r15 (__stack_pointer)
    0x262b49d155a <+58>:  sub    esi, 0x10                    ; rsi -= 16
    0x262b49d155d <+61>:  mov    dword ptr [rdi + 0xa0], esi  ; __vmctx->__stack_pointer = rsi
    0x262b49d1563 <+67>:  mov    r12, qword ptr [rdi + 0x60]
    0x262b49d1567 <+71>:  mov    dword ptr [r12 + rsi + 0xc], edx
    0x262b49d156c <+76>:  mov    rbx, rdi
    0x262b49d156f <+79>:  mov    r13, rdx
    0x262b49d1572 <+82>:  mov    rsi, rbx ; rsi = __vmctx
->  0x262b49d1575 <+85>:  call   0x262b49d1180  ; debug_break at generic.h:9:28
    0x262b49d157a <+90>:  mov    rdx, r13
    0x262b49d157d <+93>:  mov    edx, edx
    0x262b49d157f <+95>:  mov    ecx, dword ptr [r12 + rdx]
    0x262b49d1583 <+99>:  mov    rdx, qword ptr [r12 + rdx + 0x8]
    0x262b49d1588 <+104>: mov    dword ptr [rbx + 0xa0], r15d ; __vmctx->__stack_pointer = r15
    0x262b49d158f <+111>: lea    eax, [rcx + rdx]
    0x262b49d1592 <+114>: mov    rbx, qword ptr [rsp]
    0x262b49d1596 <+118>: mov    r12, qword ptr [rsp + 0x8]
    0x262b49d159b <+123>: mov    r13, qword ptr [rsp + 0x10]
    0x262b49d15a0 <+128>: mov    r15, qword ptr [rsp + 0x18]
    0x262b49d15a5 <+133>: add    rsp, 0x20
    0x262b49d15a9 <+137>: mov    rsp, rbp
    0x262b49d15ac <+140>: pop    rbp
    0x262b49d15ad <+141>: ret
    0x262b49d15ae <+142>: ud2

So the culprit here is the fact rsi, which initially does indeed contain the original WASM frame base (local 0), from which everything else gets evaluated, gets overwritten just before the call we've stopped on, and this is either not reflected in the frame info, or is not accounted for correctly.

BTW, here we can also see why the WASM frame base is so prone to being "optimized out" even as it is always used at the end of the method. Local zero gets split into two distinct values: one referenced in code and not referenced in DI (r15), the other - not referenced in code, but referenced in DI (rsi).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 25 2024 at 13:58):

SingleAccretion edited a comment on issue #9900:

This is an interesting one. Here's how to reproduce the failure:

> lldb.exe -o "b InstanceMethod" -o "r" -- wasmtime.exe -Ccache=n -Ddebug-info generic.wasm

   25     int InstanceMethod() {
-> 26       debug_break();
   27       return BaseValue + DerivedValue;
   28     }
   29
(lldb) v
(WasmtimeVMContext *) __vmctx = 0x000001dc1b2b2e10
(WebAssemblyPtrWrapper<DerivedType>) this = (__ptr = <read memory from 0x1dc38812e1c failed (0 of 4 bytes read)>)

Here's the expression for this/__this:

Variable: id = {0x0000048d}, name = "this", type = "WebAssemblyPtrWrapper<DerivedType>", valid ranges = <block>,
          location = DW_OP_breg4 RSI+0, DW_OP_plus_uconst 0xc, DW_OP_breg5 RDI+96, DW_OP_deref, DW_OP_swap, DW_OP_const4u 0xffffffff, DW_OP_and, DW_OP_plus, decl =
          {RSI + 0xc, [RDI + 96]} ==> {[RDI + 96], RSI + 0xc} ==> [RDI + 96] + (RSI + 0xc) & 0xffffffff
; [RDI + 96] is __vmctx->memory
; RSI is the linear memory frame pointer (WASM level frame base, sans offset)

Here's annotated disassembly:

    0x262b49d1520 <+0>:   push   rbp
    0x262b49d1521 <+1>:   mov    rbp, rsp
    0x262b49d1524 <+4>:   mov    r10, qword ptr [rdi + 0x8]
    0x262b49d1528 <+8>:   mov    r10, qword ptr [r10 + 0x10]
    0x262b49d152c <+12>:  add    r10, 0x30
    0x262b49d1530 <+16>:  cmp    r10, rsp
    0x262b49d1533 <+19>:  ja     0x262b49d15ae  ; <+142> at generic.cpp:27:5
    0x262b49d1539 <+25>:  sub    rsp, 0x20
    0x262b49d153d <+29>:  mov    qword ptr [rsp], rbx
    0x262b49d1541 <+33>:  mov    qword ptr [rsp + 0x8], r12
    0x262b49d1546 <+38>:  mov    qword ptr [rsp + 0x10], r13
    0x262b49d154b <+43>:  mov    qword ptr [rsp + 0x18], r15
    0x262b49d1550 <+48>:  mov    r15d, dword ptr [rdi + 0xa0] ; r15 = __vmctx->__stack_pointer
    0x262b49d1557 <+55>:  mov    rsi, r15                     ; rsi = r15 (__stack_pointer)
    0x262b49d155a <+58>:  sub    esi, 0x10                    ; rsi -= 16
    0x262b49d155d <+61>:  mov    dword ptr [rdi + 0xa0], esi  ; __vmctx->__stack_pointer = rsi
    0x262b49d1563 <+67>:  mov    r12, qword ptr [rdi + 0x60]
    0x262b49d1567 <+71>:  mov    dword ptr [r12 + rsi + 0xc], edx
    0x262b49d156c <+76>:  mov    rbx, rdi
    0x262b49d156f <+79>:  mov    r13, rdx
    0x262b49d1572 <+82>:  mov    rsi, rbx ; rsi = __vmctx
->  0x262b49d1575 <+85>:  call   0x262b49d1180  ; debug_break at generic.h:9:28
    0x262b49d157a <+90>:  mov    rdx, r13
    0x262b49d157d <+93>:  mov    edx, edx
    0x262b49d157f <+95>:  mov    ecx, dword ptr [r12 + rdx]
    0x262b49d1583 <+99>:  mov    rdx, qword ptr [r12 + rdx + 0x8]
    0x262b49d1588 <+104>: mov    dword ptr [rbx + 0xa0], r15d ; __vmctx->__stack_pointer = r15
    0x262b49d158f <+111>: lea    eax, [rcx + rdx]
    0x262b49d1592 <+114>: mov    rbx, qword ptr [rsp]
    0x262b49d1596 <+118>: mov    r12, qword ptr [rsp + 0x8]
    0x262b49d159b <+123>: mov    r13, qword ptr [rsp + 0x10]
    0x262b49d15a0 <+128>: mov    r15, qword ptr [rsp + 0x18]
    0x262b49d15a5 <+133>: add    rsp, 0x20
    0x262b49d15a9 <+137>: mov    rsp, rbp
    0x262b49d15ac <+140>: pop    rbp
    0x262b49d15ad <+141>: ret
    0x262b49d15ae <+142>: ud2

So the culprit here is the fact rsi, which initially does indeed contain the original WASM frame base (local 0), from which everything else gets evaluated, gets overwritten just before the call we've stopped on, and this is either not reflected in the frame info, or is not accounted for correctly.

BTW, here we can also see why the WASM frame base is so prone to being "optimized out" even as it is always used at the end of the method. Local zero gets split into two distinct values: one referenced in code and not referenced in DI (r15), the other - not referenced in code, but referenced in DI (rsi).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 27 2024 at 05:58):

alexcrichton added the wasmtime:debugging label to Issue #9900.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2024 at 21:42):

SingleAccretion commented on issue #9900:

Upgrading the logging for location descriptor building, here's what we see:

Building ranges for values:
L#0  : %rdx@[49..86) %r13@[86..94)
L#1  : %r15@[49..105)
L#3  : %rsi@[59..72)
L#4  : %rdx@[94..100)
L#5  : %rcx@[96..112)
L#6  : %rcx@[96..112)
L#8  : %rdx@[100..112)
VMCTX: %rdi@[49..86) %rbx@[86..115)

L3 here is our frame base (DW_OP_WASM_location 0x0 0x3), and it is correctly reported to live for only these three instructions:

    0x262b49d155d <+61>:  mov    dword ptr [rdi + 0xa0], esi  ; __vmctx->__stack_pointer = rsi
    0x262b49d1563 <+67>:  mov    r12, qword ptr [rdi + 0x60]
    0x262b49d1567 <+71>:  mov    dword ptr [r12 + rsi + 0xc], edx

This implies a bug in range intersection.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2025 at 21:23):

SingleAccretion commented on issue #9900:

The complete range info looks like this:

Building ranges for values in scope: [0..144)
L#0   : %rdx@[49..86) %r13@[86..94)
L#1   : %r15@[49..105)
L#3   : %rsi@[59..72)
L#4   : %rdx@[94..100)
L#5   : %rcx@[96..112)
L#6   : %rcx@[96..112)
L#8   : %rdx@[100..112)
VMCTX : %rdi@[49..86) %rbx@[86..115)
Intersecting with L#3
Intersecting with VMCTX
Built ranges:
[VMCTX:%rdi, L#3:%rsi]@[59..72)

So we end up with a correct (single) range. That leads me to believe the bug is here:

https://github.com/bytecodealliance/wasmtime/blob/0fff9c10809b551730ff8e9d2524b849272210cf/crates/cranelift/src/debug/transform/attr.rs#L305-L319

We can only optimize to a single expression when the range covers the entire scope, which I suspect it almost never will.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 14 2025 at 19:51):

SingleAccretion edited a comment on issue #9900:

The complete range info looks like this:

Building ranges for values in scope: [0..144)
L#0   : %rdx@[49..86) %r13@[86..94)
L#1   : %r15@[49..105)
L#3   : %rsi@[59..72)
L#4   : %rdx@[94..100)
L#5   : %rcx@[96..112)
L#6   : %rcx@[96..112)
L#8   : %rdx@[100..112)
VMCTX : %rdi@[49..86) %rbx@[86..115)
Intersecting with L#3
Intersecting with VMCTX
Built ranges:
[VMCTX:%rdi, L#3:%rsi]@[59..72)

So we end up with a correct (single) range. That leads me to believe the bug is here:

https://github.com/bytecodealliance/wasmtime/blob/0fff9c10809b551730ff8e9d2524b849272210cf/crates/cranelift/src/debug/transform/attr.rs#L305-L319

We can only optimize to a single expression when the range covers the entire scope, which I suspect it almost never will.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 14 2025 at 19:51):

SingleAccretion edited a comment on issue #9900:

The complete range info looks like this:

Building ranges for values in scope: [0..144)
L#0   : %rdx@[49..86) %r13@[86..94)
L#1   : %r15@[49..105)
L#3   : %rsi@[59..72)
L#4   : %rdx@[94..100)
L#5   : %rcx@[96..112)
L#6   : %rcx@[96..112)
L#8   : %rdx@[100..112)
VMCTX : %rdi@[49..86) %rbx@[86..115)
Intersecting with L#3
Intersecting with VMCTX
Built ranges:
[VMCTX:%rdi, L#3:%rsi]@[59..72)

So we end up with a correct (single) range. That leads me to believe the bug is here:

https://github.com/bytecodealliance/wasmtime/blob/0fff9c10809b551730ff8e9d2524b849272210cf/crates/cranelift/src/debug/transform/attr.rs#L305-L319

We can only optimize to a single expression when the range covers the entire scope, which I suspect it almost never will.


Last updated: Jan 24 2025 at 00:11 UTC