Stream: wamr

Topic: threadsafe api to terminate WAMR VM from another thread?


view this post on Zulip Georgii Rylov (Sep 18 2024 at 14:28):

Hi, we use WAMR and I call wasm_runtime_set_exception to stop the vm but I call it from a thread different to the vm thread. With thread sanitiser I detected bunch of data races between wasm_func_call and wasm_runtime_set_exception

Found this example but it seems it's also calling roughly the same non thread safe function https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/samples/terminate/src/main.c#L177

p.s. I asked the same thing in github discussions https://github.com/bytecodealliance/wasm-micro-runtime/discussions/3800#discussion-7202551

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.
Hi, I call wasm_runtime_set_exception to stop the vm but I call it from a thread different to the vm thread. With thread sanitiser I detected bunch of data races between wasm_func_call and wasm_run...

view this post on Zulip Georgii Rylov (Sep 18 2024 at 14:33):

My current idea is:

  1. To add an API that can set an atomic flag
  2. Insert check for this atomic flag inside HANDLE_OP_END
    and if the flag is set call wasm_runtime_set_exception

But maybe there's already an api that I missed or somebody dealt with the problem in another way?

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Notification Bot (Sep 18 2024 at 14:56):

This topic was moved here from #general > threadsafe api to terminate WAMR VM from another thread? by fitzgen (he/him).

view this post on Zulip Georgii Rylov (Oct 02 2024 at 09:39):

here's the API that works for WASM https://github.com/bytecodealliance/wasm-micro-runtime/compare/main...g0djan:wasm-micro-runtime:godjan/terminate_from_another_thread?expand=1

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Georgii Rylov (Oct 02 2024 at 09:47):

but when I try setting exception from a different thread to the VM running AOT it seems like AOT code continues to execute inside of this line after the exception is set. Inside it clears the stack with repetitive aot_free_frame calls before invoke_native is finished

As a result created call stack appears to be empty

@Wenyong Huang do you know why invoke_native continues to execute when set_exception is called from another thread?

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.
WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Georgii Rylov (Oct 02 2024 at 09:50):

I made a workaround by inserting these lines inside of aot_free_frame

if (aot_copy_exception(module_inst, NULL)) {
  aot_create_call_stack(exec_env);
  aot_dump_call_stack(exec_env, true, NULL, 0);
}

but I believe the root problem is that there's an attempt to clear stack after exception is set and I don't get why it is like that

view this post on Zulip Wenyong Huang (Oct 08 2024 at 04:39):

@Georgii Rylov thanks for the testing, I think the invoke_native you mentioned is to execute the AOT function, and since the AOT code only checks whether the exception was thrown and whether current thread's terminated flag is set in several places (see the callings to check_exception_thrown in aot_emit_function.c and the callings to check_suspend_flags in core/iwasm/compilation), normally it will continue to execute until these places are met (for check_suspend_flags, we should also set the terminate flag in other thread).

How to safely terminate, suspend and resume wasm threads is really difficult to implement, normally the idea is to stop the world first - let all the threads (except the thread asking to stop) suspend and wait until other thread notify it to run. I ever opened an issue for the feature:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/2319,
and had done some work in branch dev/thread_suspension:
https://github.com/bytecodealliance/wasm-micro-runtime/tree/dev/thread_suspension

But due to time bandwidth, I didn't test a lot, there may be some issues. If you are interested, maybe you can have a try, e.g. by calling the API wasm_runtime_terminate_all in thread_manager.h, and adding --enable-multi-thread flag for wamrc when generating AOT file.

Motivation Some applications/scenarios require that the threads in a cluster run into the suspension state, or is to suspend all threads except the thread which is requesting the suspension. Normal...
WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Wenyong Huang (Oct 08 2024 at 04:44):

Please refer to https://github.com/bytecodealliance/wasm-micro-runtime/compare/dev/thread_suspension for the code changes.

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Georgii Rylov (Dec 05 2024 at 11:23):

Hey thank you for sharing , we got back to that but with a different approach that better suits our needs

Our problem: we want to report stacktraces when WASM app has stuck for some reason to be able to debug it

Our new proposal(I have a POC): send SIGUSR2 to the WASM VM thread and record "simplified" stacktraces in signal handler

The main complication there is achieving "async signal safety".
For that I figured:

  1. once frame created and placed on stack it's function index is never changed
  2. And frame positions also never change
  3. The stack is allocated once and only freed on termintation
  4. Head of the stack changes on add/remove operations

So in my POC I keep atomic pointer to the stack memory and atomic pointer to the head frame. That helps me to ensure that the interruption doesn't leave the stack in an invalid state as nothing besides change of this 2 pointers could be interrupted.

In signal handler I just check that the memory is still allocated and iterate over frames starting from the atomic pointer to the head frame and recording function indexes. Then I write it to the file as its async signal safe operation.

So regarding that I have few questions:

  1. Does this logic make sense to you?
  2. I don't see how to test or validate async signal safety, especially how to ensure future changes won't make it unsafe. Any ideas on that?

Last updated: Dec 23 2024 at 13:07 UTC