alexcrichton commented on Issue #1577:
cc @lostman, I wanted to ping you here about per-instance custom signal handlers (added in https://github.com/bytecodealliance/wasmtime/pull/620). There's a bit of a refactoring here to avoid some unsafe
transmute
calls, but it's not required as part of this PR itself. As a question, though, I was wondering if you could elaborate a bit on y'all's usage of signal handlers? Do you catch signals originating in wasm code itself? Or only in your host code? Additionally, does a per-Store
signal handler work for you as opposed to a per-instance signal handler?
github-actions[bot] commented on Issue #1577:
Subscribe to Label Action
cc @peterhuene
<details>
This issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
lostman commented on Issue #1577:
@alexcrichton we are using the custom signal handler for memory access tracking using sigsegv+mprotect.
Setting the signal handler per-store will also work but at first glance seems less clean.
Each handler has an instance specific behavior. So we could set the handler once and add some sort of registry to map segfaults to instances.
Or we could set the handler before running each instance, as I see you've done in the
tests/all/custom_signal_handler.hs
. Given that there might be many instances created using a single store, is that the right way to do it though?
lostman edited a comment on Issue #1577:
@alexcrichton we are using the custom signal handler for memory access tracking using sigsegv+mprotect.
Setting the signal handler per-store will also work but at first glance seems less clean.
Each handler has an instance specific behavior. So we could set the handler once and add some sort of registry to map segfaults to instances.
Or we could set the handler before running each instance, as I see you've done in the
tests/all/custom_signal_handler.hs
. Given that there might be many instances created using a single store, is that the right way to do it though?In any case we're happy to adapt to whatever makes the code simpler/better!
eust-dfinity commented on Issue #1577:
@alexcrichton There are some other things I'd like to add, which might give us headache:
This commit affects not just SIGSEGV/SIGBUS but other signals as well. Notably division by zero in host code will now trigger panic instead of producing a Trap (maybe this is fully intended - I'm just pointing it out in case it's not). My feeling is that life would be simpler to end users if they got consistent behavior, that is always Trap on attempt to div by zero no matter where the trap originated. In our case, we can probably catch it in our custom signal handler and make the whole system respond properly though...
My quick tests indicate, that after these changes signals raised manually get completely ignored (this concerns SIGSEGV, SIGILL, SIGBUS, SIGFPE), e.g. executing libc::raise(libc::SIGFPE) in a host call has no effect.
We are actually using raise(SIGILL) to terminate wasmtime's execution (similar as the killswitch in lucet). Is there any other way in wasmtime to terminate? (the newly added interrupt functionality won't work, because we want to terminate on the spot, with guarantee that no other instruction will get executed)
alexcrichton commented on Issue #1577:
@lostman thanks for taking a look! One thing I think that's a bit surprising about the instance behavior is that you're not actually guaranteed to run the per-instance signal handler if a signal happens while that instance's code is running. The way it's all set up today only if you enter with a particular instance do you get the signal handler registered, but with how wasm modules are linked you're not guaranteed that a signal handler runs. That was one simplification I was hoping to apply here where a per-store signal handler is actually what the implementation is providing because it's the only granularity at which we can guarantee it's run.
I think it's probably best from an API-point-of-view that if your usage of wasmtime can guarantee that within a particular scope there's only one instance involved then it'd be encoded into your signal handler. (largely because it's not something we can guarantee in wasmtime itself)
@eust-dfinity oh so recovering from signals in host code is something we never intended to do. It's not memory safe to arbitrarily longjmp out of the middle of a Rust function, so this is a case where wasmtime isn't quite memory safe today and we're trying to fix it. In general if you want to map things like divide-by-zero in the host to a trap in wasm it's something that will need to explicitly be done rather than relying on the signal handler.
For termination, you're asking how to terminate host code? Note that if longjmp out of your host code is safe for your application you should be able to register your own signal handler to jump out. Wasmtime should defer to it once it realizes that the fault didn't happen in wasm. Otherwise I would recommend using
Result
-propagation in Rust code to "quickly" return back to wasm, although it doesn't have the "zero extra instructions executed" property.
eust-dfinity commented on Issue #1577:
Thanks for the explanation! Makes sense.
Putting our problems aside, I still don't understand why libc::raise(SIGSEGV) gets ignored. I tried looking at the execution and it seems to be going through wasmtime's signal handler through "not handled" path, while true segfault or division by zero seem to be not triggering the handler at all.
eust-dfinity edited a comment on Issue #1577:
Thanks for the explanation! Makes sense.
Putting our problems aside, I still don't understand why libc::raise(SIGSEGV) gets ignored. I tried looking at the execution and it seems to be going through wasmtime's signal handler through "not handled" path, while true segfault or division by zero seem to be not triggering the handler at all.
EDIT: Ok, I got confused there for a moment. To be precise real segfault does trigger the handler as it should. It's just division by zero which doesn't, but instead results in regular rust panic, so I guess it must by rust's thing and this part is OK. However I still can't get libc::raise() to abort the execution.
alexcrichton commented on Issue #1577:
Ah yeah so division by zero doesn't raise a signal in Rust because it's defined across all platforms to panic, (otherwise I think it's UB). For
libc::raise(libc::SIGSEGV)
I think I see what's happening. When wasmtime decides to ignore a signal it returns from the signal handler thinking it will reexecute the faulting instruction, but in this case there's no faulting instruction so presumably it ends up normally returning from theraise
syscall.I'm not actually sure that there's a way to handle some segfaults but still abort with
libc::raise(libc::SIGSEGV)
. I do think though that you can use other non-SIGSEGV
signals to abort the process.
eust-dfinity commented on Issue #1577:
Right, it all makes sense now. Thanks again for the explanation!
Last updated: Nov 22 2024 at 16:03 UTC