jameysharp commented on issue #5553:
This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!
I don't know whether removing this assert is correct. It might be! But I think the effect is that the
for i in (i..j).rev()
loop below doesn't run in that case becausej
will also be less thani
. I'm suspicious that removing the assert may just be hiding the bug.As noted in other issues about this bug:
We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...
thaystg commented on issue #5553:
Minimal repro:
#include <stdio.h> void sample_func (int type, int parm) { printf("sample_func\n"); switch (type){ case 1: if (type + parm!= 50) break; case 2: if (type + parm == 10 || type + parm == 15) return; break; default: break; } } int main() { int i = 2; sample_func (i, i); return 0; }
thaystg edited a comment on issue #5553:
Minimal repro:
#include <stdio.h> void sample_func (int type, int parm) { printf("sample_func\n"); switch (type){ case 1: if (type + parm!= 50) break; case 2: if (type + parm == 10 || type + parm == 15) return; break; default: break; } } int main() { int i = 2; sample_func (i, i); return 0; }
Steps to Reproduce
wasi-sdk-14.0/bin/clang -g -O0 test.c -o test.wasm lldb -- wasmtime run -g test.wasm
thaystg commented on issue #5553:
This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!
I don't know whether removing this assert is correct. It might be! But I think the effect is that the
for i in (i..j).rev()
loop below doesn't run in that case becausej
will also be less thani
. I'm suspicious that removing the assert may just be hiding the bug.As noted in other issues about this bug:
We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...
Can't we just print a warning, remove the assertion and exit the function?
It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.
jameysharp commented on issue #5553:
Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.
@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?
If you can spare some more time to investigate this bug, I'd be happier if we better understood its cause.
- Looking at
process_label
, I see that it'sFunctionFrameInfo::value_ranges
which has an invalid range for some label.- As far as I can tell, the values in that map are only set by
get_function_frame_info
.- That comes from
CompiledFunction::value_labels_ranges
in the wasmtime-cranelift crate.- That comes from
CompiledCodeBase::value_labels_ranges
in the cranelift-codegen crate.- That comes from target-specific code implementing
TargetIsa::compile_function
.- Every backend implements that by calling
VCode::emit
and getting theEmitResult::value_labels_ranges
field.- That's computed by
VCode::compute_value_labels_ranges
.At that point there's a somewhat complex interaction between the
inst_offsets
array constructed in cranelift-codegen and thedebug_locations
array constructed by regalloc2.I think a good first troubleshooting step would be to assert that
from_offset < to_offset
before pushing a newValueLocRange
inVCode::compute_value_labels_ranges
. Hopefully that assert fails instead of the assert inprocess_label
; otherwise we'd need to investigate more to understand where these ranges are coming from.Then we could check why these ranges are out of order. I think the only possibilities are either that
regalloc.debug_locations
has pairs that are out of order, orinst_offsets
is out of order. If it's the former then we can switch to investigating regalloc2; otherwise we can look atVCode::emit
.Either way we'd know more than we know now and be closer to fixing this particular bug.
fitzgen commented on issue #5553:
Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.
@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?
I am generally very suspicious of sweeping-the-bugs-under-the-rug kind of thing. If it was a bug that was purely within the DWARF transform, then I would be more amenable to it since that code as-is has no future and needs a rewrite, but since it looks like this is bottoming out in Cranelift, I think it would be better to debug the Cranelift issue like you laid out.
alexcrichton commented on issue #5553:
I personally know so little about the DWARF transformations Wasmtime does that I don't think I'd have much to offer over what @jameysharp has already mentioned.
pavelsavara commented on issue #5553:
The dwarf parser in chrome doesn't fail on it.
Speculations (I know little about all this yet):
Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ?
And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?
jameysharp commented on issue #5553:
The dwarf parser in chrome doesn't fail on it.
Speculations (I know little about all this yet): Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ? And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?
That sounds possible to me but, like you, I know little about all of this.
I think it's unlikely, if I'm reading the Wasmtime/Cranelift source correctly. I think the point of the
process_label
function is to combine Cranelift's data with the input DWARF from LLVM, but I think this assert is only looking at data that was generated by Cranelift and regalloc2, not from the input DWARF. So I think it's most likely that this is detecting a bug in Cranelift or regalloc2.
brendandburns commented on issue #5553:
fwiw, when I kicked on this a little bit,
llvm-dwarfdump
is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)
squillace commented on issue #5553:
I've asked @thaystg to have a look at this again if she has time, @jameysharp, per our conversation the other day about debugging with the crew.
william-stacken commented on issue #5553:
fwiw, when I kicked on this a little bit,
llvm-dwarfdump
is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)Same here, removing the assertion did not fix the problem, but changing the if statement above to continue if
range_start
is greater than range_end` (not just when it is equal) did fix it and I am able to finally debug my Wasm module. Given that this bug is severe and that developers have complained about it for a long time, and that no one knows the cause making this bug difficult to fix for a long time, I think that it should be "swept-under-the-rug" ASAP. Then the root cause for the bug can be fixed later. At least this is better than debugging Wasm modules not being possible.
william-stacken edited a comment on issue #5553:
fwiw, when I kicked on this a little bit,
llvm-dwarfdump
is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)Same here, removing the assertion did not fix the problem, but changing the if statement above to continue if
range_start
is greater thanrange_end
(not just when it is equal) did fix it and I am able to finally debug my Wasm module. Given that this bug is severe and that developers have complained about it for a long time, and that no one knows the cause making this bug difficult to fix for a long time, I think that it should be "swept-under-the-rug" ASAP. Then the root cause for the bug can be fixed later. At least this is better than debugging Wasm modules not being possible.
bjorn3 commented on issue #5553:
but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.
I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.
william-stacken commented on issue #5553:
but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.
I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.
I was able to both single step and set breakpoints, and I didn't observe any strange instruction to source line assignments. Now I know absolutely nothing about dwarf, and I'm sure you're right that there will be bugs. But from my perspective an unstable debugger is better than no debugger at all. At least for the time being.
alexcrichton commented on issue #5553:
In the time since this PR was opened I think https://github.com/bytecodealliance/wasmtime/pull/6931 addresed the underlying issue. I think this is no longer necessary so I'm going to close this, but please let me know if I'm wrong.
Last updated: Jan 24 2025 at 00:11 UTC