Hi, we use WAMR and I call wasm_runtime_set_exception
to stop the vm but I call it from a thread different to the vm thread. With thread sanitiser I detected bunch of data races between wasm_func_call
and wasm_runtime_set_exception
Found this example but it seems it's also calling roughly the same non thread safe function https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/samples/terminate/src/main.c#L177
p.s. I asked the same thing in github discussions https://github.com/bytecodealliance/wasm-micro-runtime/discussions/3800#discussion-7202551
My current idea is:
wasm_runtime_set_exception
But maybe there's already an api that I missed or somebody dealt with the problem in another way?
This topic was moved here from #general > threadsafe api to terminate WAMR VM from another thread? by fitzgen (he/him).
here's the API that works for WASM https://github.com/bytecodealliance/wasm-micro-runtime/compare/main...g0djan:wasm-micro-runtime:godjan/terminate_from_another_thread?expand=1
but when I try setting exception from a different thread to the VM running AOT it seems like AOT code continues to execute inside of this line after the exception is set. Inside it clears the stack with repetitive aot_free_frame
calls before invoke_native
is finished
As a result created call stack appears to be empty
@Wenyong Huang do you know why invoke_native
continues to execute when set_exception is called from another thread?
I made a workaround by inserting these lines inside of aot_free_frame
if (aot_copy_exception(module_inst, NULL)) {
aot_create_call_stack(exec_env);
aot_dump_call_stack(exec_env, true, NULL, 0);
}
but I believe the root problem is that there's an attempt to clear stack after exception is set and I don't get why it is like that
@Georgii Rylov thanks for the testing, I think the invoke_native
you mentioned is to execute the AOT function, and since the AOT code only checks whether the exception was thrown and whether current thread's terminated flag is set in several places (see the callings to check_exception_thrown in aot_emit_function.c and the callings to check_suspend_flags in core/iwasm/compilation), normally it will continue to execute until these places are met (for check_suspend_flags, we should also set the terminate flag in other thread).
How to safely terminate, suspend and resume wasm threads is really difficult to implement, normally the idea is to stop the world first - let all the threads (except the thread asking to stop) suspend and wait until other thread notify it to run. I ever opened an issue for the feature:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/2319,
and had done some work in branch dev/thread_suspension:
https://github.com/bytecodealliance/wasm-micro-runtime/tree/dev/thread_suspension
But due to time bandwidth, I didn't test a lot, there may be some issues. If you are interested, maybe you can have a try, e.g. by calling the API wasm_runtime_terminate_all in thread_manager.h, and adding --enable-multi-thread
flag for wamrc when generating AOT file.
Please refer to https://github.com/bytecodealliance/wasm-micro-runtime/compare/dev/thread_suspension for the code changes.
Hey thank you for sharing , we got back to that but with a different approach that better suits our needs
Our problem: we want to report stacktraces when WASM app has stuck for some reason to be able to debug it
Our new proposal(I have a POC): send SIGUSR2 to the WASM VM thread and record "simplified" stacktraces in signal handler
The main complication there is achieving "async signal safety".
For that I figured:
So in my POC I keep atomic pointer to the stack memory and atomic pointer to the head frame. That helps me to ensure that the interruption doesn't leave the stack in an invalid state as nothing besides change of this 2 pointers could be interrupted.
In signal handler I just check that the memory is still allocated and iterate over frames starting from the atomic pointer to the head frame and recording function indexes. Then I write it to the file as its async signal safe operation.
So regarding that I have few questions:
Last updated: Jan 24 2025 at 00:11 UTC