Hi guys, just upgraded our stuff to latest wasmtime & seems breakpoint behaviour has changed.
Previously got lldb::eStopReasonBreakpoint event on a breakpoint.
Now apparently lldb::eStopReasonException
Does anyone know anything about this, please. Is it intentional ? A regression of some kind ?
Do I have something setup wrong ?
Haven't worked on this code for quite some time so very rusty & may be user error.
Rebuilt my previous (working) wasmtime from scratch on same system to attempt to determine whether something else on my system responsible for regression. Triggers breakpoint as expected.
reviewing previous releases - the one I was working from was pre-1.0 ... to attempt to figure out where the regression/change occured.
weird. 1.0.1 also generates an exception - so change isnt recent.
fresh pull of 0.3.0 which I was working from acting unexpectedly, so my issue for the time being.
v0.31.0 triggers breakpoints correctly. v0.32.0, the next tagged release triggers an exception & subsequent releases trigger exceptions.
[ above should have said 0.30.0 not 0.3.0 but in any case, that release is irrelevant - the change in behaviour occurs between 0.31.0 & 0.32.0 ]
Seems this is the cause of the regression :
Revision: ce67e7fcd1d2d6da1899d2b46cc41ef877bd9462
Author: Amanieu d'Antras <amanieu@gmail.com>
Date: 02/12/2021 11:53:04
Message:
Fix ownership in *_vec_new functions in the C API
These functions are specified to take ownership of the objects in the given slice, not clone them.
Modified: crates/c-api/src/vec.rs
yup, replacing those couple of lines with previous implementation restores Breakpoint behaviour (minimal testing thus far).
next will try migrating that fix to trunk to see if that resolves regression there.
beyond that, this is rust code I have no clue about so will have to defer to others for complete resolution.
I'm going as far as just get it working again.
yup, revert of that change to trunk gets breakpoints running again. yay :)
needs a little more testing, then will push if wanted.
as to the clone/ownership issue the change was supposed to address, its implications & why it generates an exception, dunno.
I need working recent version. this can be considered in more depth later. worked before that change, works with it removed. what it's supposed to do needs further consideration to avoid debugger breakpoint regression.
likely a duplicate release of something if was assumed cloned before & behaviour changed to ownership change. so could be fixed properly by tracking that down, but beyond scope of my current work.
that works for me :) & gets this running wasmtime trunk.
advance-software.com/develop
I dont have permission to push apparently so here's the change in its current form.
Referenced here, as perhaps the same issue : https://github.com/bytecodealliance/wasmtime/issues/3999
If you run valgrind on your program built against the latest wasmtime version without your patch does it give any errors?
Hi Bjorn, can't run valgrind as windows user but could run drmemory. cant go much deeper as we're severely resource constrained so interaction must be limited to fire fighting for now. fire out - debugging working again.
concerned this has regressed for 17 months now. does nobody else use a debugger ? do we have a test that ensures debugger breakpoints are emitted properly ?
link with issue 3999 perhaps false but the issue itself genuine & corrected with this patch.
let me see if I can get some useful diagnostics out of this.
What would be most helpful is a minimal test program demonstrating the issue. As far as I know, Wasmtime doesn't have anything to do with emitting breakpoints, so I don't understand what behavior you were expecting.
Hi Jamey,
Ran lldbg under visual studio debugger & used it to identify the regression & locate when the code regressed. That's all I've got for now. Its using some lldb API I don't fully understand yet as inherited code. Perhaps as we have a behaviour change, the issue causing the exception is in the app. Will investigate further. Will see if I can get lldb command line version to exhibit same behaviour.
My version : https://github.com/adv-sw/lldbg
With above fix, the following is hit : https://github.com/adv-sw/lldbg/blob/master/src/Application.cpp#L2127
without it, get an exception. That's all I currently know.
Specifically a lldb::eStopReasonException which isn't handled by the above code, but I can see it in the Visual Studio debugger.
Looks like will need a debug build of lldb to figure out why that's getting emitted.
I'll investigate this a little but with respect, its not my problem. The patch caused a regression in a key component - lldb debugger.
The patch should be backed out as debugger was functioning perfectly well before it was introduced and does so again with it removed.
Patch should be re-accepted when it doesnt cause failure side effect. Fixing may be non-trivial & production work needs to proceed.
An issue of contention may be, is lldbg using lldb API correctly. perhaps there is a programming error there. dont know at this point if fault is exhibited by lldb itself. will attempt to repro there when I get a minute.
Are you trying to reuse any of the objects passed to wasm_*_vec_new
? If so make sure to clone them. Accoring to the commit message it was documented that wasm_*_vec_new
is taking ownership, but the implementation incorrectly cloned them instead. That commit fixes the discrepancy by changing the implementation to take ownership as documented.
See https://github.com/bytecodealliance/wasmtime/pull/3582
lldbg is an inherited project. I didnt write it. its using lldb interface. like I say, its possible there's an error in api use. unknown at this point.
I meant the use of wasm_*_vec_new
in the code you wrote, not in the debugger.
Note that the wasm_*
api's are implemented not just by wasmtime, but also by wasmer and some other wasm engines. Before the commit in question wasmtime's implementation didn't match the other implementations.
will simplify repro as far as possible so we can eliminate or otherwise that concern. thanks.
building debug lldb is taking forever. left it going overnight & still hadnt linked in the morning. trying on ms azure cloud thing.
if I'm to undertake heavy development work, I need a machine that can handle it. happy to contribute but not a magician :)
my quick fix is the patch I provided which is sufficient for our purposes for the time being. this is going beyond my needs.
I'm running a very old version of wasi. could be that's the issue if nobody else experiencing a problem. will upgrade when I can figure out where to obtain latest wasi from.
from this : https://docs.wasmtime.dev/wasm-c.html
I get latest wasi by building llvm/clang for webassembly target ?
https://github.com/WebAssembly/wasi-sdk/releases
apologies, going a little all over the place at the moment getting up to date.
built lldb & clang in azure cloud as my local machine apparently has issues. debug build completed. building release version too so we have clang tools latest for test compilation. slow progress, but progress. switched to wasi 20 & compiled a simple test successfully.
doh. yesterday's local lldb build failed coz I over compensated for an out of memory error & made a huge swap file on that drive, leaving insufficient room for build directory, so its stalled during linking :smile:
update: our stuff running latest wasmtime, latest wasi, latest llvm. now we're up to date, will investigate cause of the exception.
latest everything same behaviour as previous, with patch, works, without exception.
Finally, cause of that exception.
dinner then will try to get my head around why its doing that.
Error is : "RVA 0x0 for export ordinal table not found"
Currently means nothing to me. Dont yet understand what the problem is.
not understanding why an ownership change rather than clone should cause that. its not a duplicate release issue as I first thought.
next step, I think is to simplify the repro.
I see Microsoft has documentation on the "export ordinal table" as part of their PE format documentation, if that helps any: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#export-ordinal-table
So, the issue appears to be in liblldb.dll!ObjectFilePECOFF::AppendFromExportTable which is PE specific, which is Windows platform.
I had to patch lldb years ago to get it running on Windows. The patch wasnt accepted as they require a test for every patch & I'm swamped with work as it is. Practical upshot: Fairly sure nobody else is testing on windows.
Here's the patch : https://pastebin.com/L1Q1d4hS
Patch works latest lldb - 16.0.2
Not really sure what test can be provided to verify it works & hence why not accepted yet.
On Windows, it appears you need to build lldb with x86 as well as web assembly target for it to function.
I'm using
cmake -S llvm -DLLVM_TARGETS_TO_BUILD="X86;WebAssembly" -DCMAKE_BUILD_TYPE="Debug" -DLLVM_ENABLE_PROJECTS="clang;lldb;lld" -B build_d -G Ninja
Then : ninja lldb
From the build_d directory.
Seem to remember there were issues with debugging native windows binaries last time I looked at this & perhaps still now, though all this is irrelevant. We're not trying to do that. We're trying to figure out why the change of a memory op in wasmtime from clone to re-assign owner causes a fault in the debugger.
update: compiled the simplest c program possible :
void _start() {}
ran directly under wasmtime.exe rather than our runtime, under lldbg debugger. runs clean.
so my bug. next steps - reduce our stuff to the point where it no longer faults to identify cause.
sounds to me that I should try to find someone to have a look here.
working on a minimal repro. should have something soon.
the lldb patch is something I need help with. confirm resolves lldb functionality on windows & accept into trunk. can maintain out of tree but seems silly. helps everyone.
windows support obv not important for server based delivery but is for client front end (web browsers, etc.), coz like it or not, windows is the primary end user platform. so we must be solid there.
update: got simple test case. test case running standalone works correctly. test case running as plugin for our engine faults the same way our regular web assembly plugin faults. so its something configuration specific or perhaps a fault in our engine having a side effect.
fun times. will isolate.
wheeeeeeeee
hmmm ... suspect this is bug in our engine having a side effect & nothing for anyone else to worry about, but we'll see. narrowing in on it.
You DID have my attention, but now I'm merely curious.
think I've got something
http:/advance-software.com/misc/repro.zip
extracts into your [wasmtime]/examples directory
so you have [wasmtime]/examples/repro not [wasmtime]/examples/repro/repro
fault: on windows, when compiled as a win64 visual studio project using included project file, wasm_module_new fails.
when compiled using \repro\native\make_d or make_r works - console verison.
this has been really fiddly to put together so still possibility of something silly on my part.
for those not on windows or not wanting to download that large archive which includes full wasi sdk we're linking against & llvm compilation tools being used, here's the wasm native sdk code I'm using.
so first step is quick skim over that anyone curious, plz. any obvious derp on my part, plz.
happy to run it here so we can figure out why wasm_module_new fails & that's my next job. will figure it out all the way if I can.
debug build of the native win64 .exe doesn't fail, release does which is always a fun situation.
they both compile to the same location as of that drop - should tweak that.
update : http://advance-software.com/misc/repro_2.zip
exactly the same as the first drop except native win64 example links to repro_d.exe & repro_r.exe instead of both to repro.exe
step 1 - am I doing anything obviously stupid.
step 2 - can anyone else reproduce the problem.
repro doesn't contain or reference our engine so api use error, genuine wasmtime bug or I'm misunderstanding something.
repro release build linking against wasmtime debug build fails so can step through wasmtime & hopefully understand why. more caffeine than a look at that.
can step into it but I'm rust stupid so this will be fun :)
debug native app, targeting debug wasmtime correctly hits this :
https://github.com/bytecodealliance/wasmtime/blob/4b3c50b147f9f96717becf44bcfcff2de9b06c68/crates/wasmtime/src/module.rs#L459
release native app, targeting debug wasmtime doesn't & fails.
not sure this is the same issue. tried replacing the clone/ownership change to see if behaviour changes & it doesnt.
so I still perhaps have a stupid in here.
though regardless, this is unexpected result as far as I can tell so suggest we nail whatever this is & onwards.
native app release build failing here : https://github.com/bytecodealliance/wasmtime/blob/4b3c50b147f9f96717becf44bcfcff2de9b06c68/crates/wasmtime/src/module.rs#L590
\wasmparser-0.103.0\src\parser.rs
pub fn parse_all(self, mut data: &[u8])
data is all 0s in the native app release build that's failing.
this one is my derp. apologies. data is coming thru 0s release build. so test has an error.
so, ummmm.
int count = fread(wasm.data, file_size,1, file);
fixed by doing this instead :
int count = fread(wasm.data, 1, file_size, file);
have no idea why that was failing release build.
so test fixed & currently working. which is good coz this wasnt a repro for the original problem.
hence anyone looking at my repro, plz drop it for now. its not a repro for the problem we're looking at. was a test error.
now I have a sane test environment will see if I can repro the actual problem.
& changed that to a size_t now its working
doesnt require anyone else on this right now as we dont have a repro yet. will try again now test environment sane. update when I have something.
Got it.
The call :
wasm_functype_new_1_1(val_i32, val_i32);
in : https://pastebin.com/ZGKuqzPt
faults with whatever trunk I'm running (will update so at tip), passes with my workaround patch : https://pastebin.com/2Yt3FG9E
don't think its specific to the wasm I'm loading but we'll see. will reduce to minimal example & move to tip.
fault exists wasmtime tip
minimal repro.
this faults for me wasmtime trunk.
windows - dunno about other o/s.
passes with my workaround patch : https://pastebin.com/2Yt3FG9E
this thread far more verbose than it needs to be. apologies.
reduced to the following hopefully actionable issue : https://github.com/bytecodealliance/wasmtime/issues/6347
thanks bjorn3 :)
weird api tripped me up, apparently. will rework as advised. excess allocation that way, but if its the spec, its the spec :)
perhaps debug build of wasmtime could fail gracefully in that api misuse case as its quite unintuitive.
you'd think what's effectively a constant could be reused multiple times.
so we never call wasm_valtype_delete ? faults when I do so assume not, so seems odd that's an api function.
resolved. thanks guys. weird api but is what it is.
from a language perspective, c has no way of saying after you pass this thing to me, you may never use it again - the api own keyword must be a null macro. best you can do is pass thing ** & inside the call *thing = NULL so caller cannot reuse.
void *__owned = NULL;
void Consume(void **thing)
{
__owned = *thing;
*thing = NULL;
}
int main()
{
void *my_thing = 1;
Consume(&my_thing );
assert(my_thing == NULL);
}
or, I guess, upgrade C :)
this is my new favorite soap opera
apologies. haven't upgraded our wasmtime or wasi stuff in ages so had a bit of an omg what moment :)
but now we're up to date. thanks guys :)
http://advance-software.com/develop
http://advance-software.com/products/browser
http://advance-software.com/xsg
still early but we've got a next-gen browser & next-gen web grammar (xsg) featuring web assembly secure programming interface, currently powered by wasmtime.
we're only using the wasm api so theoretically could swap out for any of the others, but happy with how things are for the now.
all the hard work very much appreciated.
happy to add some credits if there's a document for stuff like that or we can put one together.
& this thread can go :)
WUT NO
DAMMIT STEVE, PLEASE KEEP DIGGING FOR MORE
I'm done for now :) debugger needs refinement but that's a side project. lldb needs that patch pushing but can maintain out of tree.
fread weirdness - very odd but not my problem :) we're solid. thanks all :)
& wasm sdk is what it is. not a big deal, just gotta know how to drive :)
Last updated: Nov 22 2024 at 16:03 UTC