Stream: git-wasmtime

Topic: wasmtime / issue #5553 Removing assertion while reading/a...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2023 at 21:27):

jameysharp commented on issue #5553:

This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!

I don't know whether removing this assert is correct. It might be! But I think the effect is that the for i in (i..j).rev() loop below doesn't run in that case because j will also be less than i. I'm suspicious that removing the assert may just be hiding the bug.

As noted in other issues about this bug:

We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...

view this post on Zulip Wasmtime GitHub notifications bot (Jan 10 2023 at 15:42):

thaystg commented on issue #5553:

Minimal repro:

#include <stdio.h>

void
sample_func (int type, int parm)
{
    printf("sample_func\n");
    switch (type){
    case 1:
        if (type + parm!= 50)
            break;
    case 2:
        if (type + parm == 10 || type + parm == 15)
            return;
        break;
    default:
        break;
    }
}

int main() {
    int i = 2;
    sample_func (i, i);
    return 0;
}

view this post on Zulip Wasmtime GitHub notifications bot (Jan 10 2023 at 16:13):

thaystg edited a comment on issue #5553:

Minimal repro:

#include <stdio.h>

void
sample_func (int type, int parm)
{
    printf("sample_func\n");
    switch (type){
    case 1:
        if (type + parm!= 50)
            break;
    case 2:
        if (type + parm == 10 || type + parm == 15)
            return;
        break;
    default:
        break;
    }
}

int main() {
    int i = 2;
    sample_func (i, i);
    return 0;
}

Steps to Reproduce

wasi-sdk-14.0/bin/clang -g -O0 test.c -o test.wasm
lldb -- wasmtime run -g test.wasm

view this post on Zulip Wasmtime GitHub notifications bot (Jan 10 2023 at 16:15):

thaystg commented on issue #5553:

This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!

I don't know whether removing this assert is correct. It might be! But I think the effect is that the for i in (i..j).rev() loop below doesn't run in that case because j will also be less than i. I'm suspicious that removing the assert may just be hiding the bug.

As noted in other issues about this bug:

We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...

Can't we just print a warning, remove the assertion and exit the function?
It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 10 2023 at 21:30):

jameysharp commented on issue #5553:

Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?

If you can spare some more time to investigate this bug, I'd be happier if we better understood its cause.

At that point there's a somewhat complex interaction between the inst_offsets array constructed in cranelift-codegen and the debug_locations array constructed by regalloc2.

I think a good first troubleshooting step would be to assert that from_offset < to_offset before pushing a new ValueLocRange in VCode::compute_value_labels_ranges. Hopefully that assert fails instead of the assert in process_label; otherwise we'd need to investigate more to understand where these ranges are coming from.

Then we could check why these ranges are out of order. I think the only possibilities are either that regalloc.debug_locations has pairs that are out of order, or inst_offsets is out of order. If it's the former then we can switch to investigating regalloc2; otherwise we can look at VCode::emit.

Either way we'd know more than we know now and be closer to fixing this particular bug.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 10 2023 at 22:20):

fitzgen commented on issue #5553:

Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?

I am generally very suspicious of sweeping-the-bugs-under-the-rug kind of thing. If it was a bug that was purely within the DWARF transform, then I would be more amenable to it since that code as-is has no future and needs a rewrite, but since it looks like this is bottoming out in Cranelift, I think it would be better to debug the Cranelift issue like you laid out.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 11 2023 at 14:51):

alexcrichton commented on issue #5553:

I personally know so little about the DWARF transformations Wasmtime does that I don't think I'd have much to offer over what @jameysharp has already mentioned.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 11 2023 at 15:19):

pavelsavara commented on issue #5553:

The dwarf parser in chrome doesn't fail on it.

Speculations (I know little about all this yet):
Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ?
And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 11 2023 at 16:47):

jameysharp commented on issue #5553:

The dwarf parser in chrome doesn't fail on it.

Speculations (I know little about all this yet): Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ? And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?

That sounds possible to me but, like you, I know little about all of this.

I think it's unlikely, if I'm reading the Wasmtime/Cranelift source correctly. I think the point of the process_label function is to combine Cranelift's data with the input DWARF from LLVM, but I think this assert is only looking at data that was generated by Cranelift and regalloc2, not from the input DWARF. So I think it's most likely that this is detecting a bug in Cranelift or regalloc2.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2023 at 18:53):

brendandburns commented on issue #5553:

fwiw, when I kicked on this a little bit, llvm-dwarfdump is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2023 at 17:57):

squillace commented on issue #5553:

I've asked @thaystg to have a look at this again if she has time, @jameysharp, per our conversation the other day about debugging with the crew.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 09:33):

william-stacken commented on issue #5553:

fwiw, when I kicked on this a little bit, llvm-dwarfdump is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)

Same here, removing the assertion did not fix the problem, but changing the if statement above to continue if range_start is greater than range_end` (not just when it is equal) did fix it and I am able to finally debug my Wasm module. Given that this bug is severe and that developers have complained about it for a long time, and that no one knows the cause making this bug difficult to fix for a long time, I think that it should be "swept-under-the-rug" ASAP. Then the root cause for the bug can be fixed later. At least this is better than debugging Wasm modules not being possible.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 09:33):

william-stacken edited a comment on issue #5553:

fwiw, when I kicked on this a little bit, llvm-dwarfdump is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)

Same here, removing the assertion did not fix the problem, but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module. Given that this bug is severe and that developers have complained about it for a long time, and that no one knows the cause making this bug difficult to fix for a long time, I think that it should be "swept-under-the-rug" ASAP. Then the root cause for the bug can be fixed later. At least this is better than debugging Wasm modules not being possible.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 09:42):

bjorn3 commented on issue #5553:

but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.

I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 07:45):

william-stacken commented on issue #5553:

but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.

I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.

I was able to both single step and set breakpoints, and I didn't observe any strange instruction to source line assignments. Now I know absolutely nothing about dwarf, and I'm sure you're right that there will be bugs. But from my perspective an unstable debugger is better than no debugger at all. At least for the time being.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2023 at 20:45):

alexcrichton commented on issue #5553:

In the time since this PR was opened I think https://github.com/bytecodealliance/wasmtime/pull/6931 addresed the underlying issue. I think this is no longer necessary so I'm going to close this, but please let me know if I'm wrong.


Last updated: Oct 23 2024 at 20:03 UTC