Stream: wasmtime

Topic: ✔ Option to disable signal handling


view this post on Zulip Tyler Rockwood (Aug 22 2023 at 18:38):

Hello would wasmtime be open to an option that would allow disabling the usage of signals for implementing traps like the unreachable wasm instruction generating a SIGILL, etc? We run in an environment that has some complex interactions with signal handlers and signal blocking/unblocking. There are cases we I'm running into the process aborting or hanging due to these signals and I was wondering if a config option would be something the project would consider.

view this post on Zulip fitzgen (he/him) (Aug 22 2023 at 18:52):

set these options to zero and you should only use explicit bounds checks for memories:

view this post on Zulip fitzgen (he/him) (Aug 22 2023 at 18:52):

(instead of relying on signals)

view this post on Zulip Chris Fallin (Aug 22 2023 at 19:59):

Note that we still rely on signals for floating point errors on ISAs that support that (x86 does, aarch64 doesn't, for example)

view this post on Zulip Chris Fallin (Aug 22 2023 at 20:01):

Currently we don't have a codegen option to not rely on this, it seems; I think we might have in the past and it was removed (someone would have to do some more digging here)

view this post on Zulip Chris Fallin (Aug 22 2023 at 20:02):

(this is for div-by-0 at least; we seem to still have explicit checks for INT_MIN / -1)

view this post on Zulip Tyler Rockwood (Aug 22 2023 at 20:26):

Thank you, yes I'm thinking about the SIGFPE for division by zero on x86 and not using ud2 and similar instructions for unreachable.

Would having a codegen option to not rely on this be something that would be considered?

view this post on Zulip Chris Fallin (Aug 22 2023 at 20:30):

I think it's reasonable to at least consider. One thing we always worry about with more configuration options is the testing and maintenance overhead; but if this is localized to a few operations (div, rem) maybe it's not so bad.

Getting from a ud2 to the trap handler may be a bit trickier though: right now our "explicit checks" still rely on that opcode to exit the Wasm. We'd need an alternate mechanism (e.g. a jump to some address that we patch in, or provide in the vmctx) for this to work.

If we do have this mode, we'd want to test it with an integration test that sets a new option on Engine to not register any signal handlers, then run either the Wasm test suite or at least specific tests we know to rely on trapping.

Especially given the difficulties we've had with signal handling on macOS, I think this could have real value; it's not just a "support some weird use-case we've never heard of" sort of PR. But starting the discussion here and then sketching it out in an issue and maybe a prototype to show the extent of the changes would help us decide for sure

view this post on Zulip Jamey Sharp (Aug 22 2023 at 20:34):

I think many of our remaining limitations in Cranelift's test suite and fuzzing are due to not being able to handle traps, so maybe we'd want to use this mode for all Cranelift testing? That would help ensure that it's well-exercised.

view this post on Zulip Tyler Rockwood (Aug 22 2023 at 20:37):

Understand completely on the maintenance + testing bits.

Getting from a ud2 to the trap handler may be a bit trickier though: right now our "explicit checks" still rely on that opcode to exit the Wasm. We'd need an alternate mechanism (e.g. a jump to some address that we patch in, or provide in the vmctx) for this to work.

Yes exactly, I'm thinking the easiest thing is to always jump to a predefined function, we could pass in a parameter for which trap. It would increase the size of the generated code, but not by much?

If we do have this mode, we'd want to test it with an integration test that sets a new option on Engine to not register any signal handlers, then run either the Wasm test suite or at least specific tests we know to rely on trapping.

You can also just pthread_sigmask and block that signal from being allowed to be handled by the thread (assuming tests are single threaded).

I think this could have real value

That's good to know! It seems there are a bunch of places that use ud2 from a quick code search so I assume this would be a largish change...

view this post on Zulip Chris Fallin (Aug 22 2023 at 20:57):

Three things to consider with the code to replace ud2: (i) we have a bunch of them; disassembly of some functions will show a whole stream of ud2 ops at the end of the function, each a specific trap-point. ud2 is two bytes (0x0f 0x0b); we'd have to take some inflation but every byte counts here. So e.g. jmp *offset(%rN) where %rN is the register holding vmctx is 8 bytes, but that's better than a full callsite with moves into registers, a call, and cleanup. (We also can't represent "non-returning callsite" and optimize based on that currently.) (ii) we try to avoid having any relocations in the code (i.e., emit PIC where possible): this makes loading precompiled .cwasms much faster, as it lets us mmap straight from disk. So we'd want to go through a pointer in vmctx. (iii) we can see the return address if we jump to a trampoline, so we could just use that; no need to pass other args. Basically we want a "fake ud2" that jumps to a little handwritten assembly trampoline that then calls into the runtime and never returns.

view this post on Zulip Tyler Rockwood (Aug 22 2023 at 21:05):

disassembly of some functions will show a whole stream of ud2 ops at the end of the function, each a specific trap-point.

I don't quite follow here - are you saying that cranelift injects a stream of ud2 ops at the end of a function? Or that upstream wasm toolchains (e.g. clang) outputs unreachable instructions in batches at the end of functions? If cranelift is doing this - why? alignment?

Just to make sure I follow, the pointer in vmctx would point to the "fake ud2" right? And the assembly trampoline essentially reads the return address, and passes that into the runtime, and the runtime looks up the trapcode for that location?

view this post on Zulip Chris Fallin (Aug 22 2023 at 21:07):

I don't quite follow here - are you saying that cranelift injects a stream of ud2 ops at the end of a function? Or that upstream wasm toolchains (e.g. clang) outputs unreachable instructions in batches at the end of functions? If cranelift is doing this - why? alignment?

The former; and not as a direct action, but as a consequence of compilation. The IR has a bunch of basic blocks, each of which ends with an unreachable. We compile unreachable as ud2, and we also mark these blocks as cold so they are sunk to the bottom of the function. So the effect is a bunch of ud2s; the identity of the trapsite is determined by the address of the specific one that traps

view this post on Zulip Chris Fallin (Aug 22 2023 at 21:08):

The pointer in vmctx would point to trampoline code; the "fake ud2" is what we generate instead of ud2 in the function body itself

view this post on Zulip Chris Fallin (Aug 22 2023 at 21:08):

the address of that code (which becomes the return address in the trampoline) serves as the identity of the particular trap we took, just as the address of the ud2 did

view this post on Zulip Tyler Rockwood (Aug 22 2023 at 21:22):

which ends with an unreachable.
mark these blocks as cold

Would the side bloat of replaces these ud2 with jumps to the fake ud2 be acceptible? Or would there need to be work to optimize away those ud2s? (assuming that is possible)

view this post on Zulip Chris Fallin (Aug 22 2023 at 21:24):

That's what I was getting at above about 2 vs. 8 bytes; I'm not sure, but I guess the real question is just "is it still reasonable / workable" (since this mode wouldn't be on by default) and it seems likely

view this post on Zulip Chris Fallin (Aug 22 2023 at 21:24):

(likely to be acceptable that is)

view this post on Zulip Tyler Rockwood (Aug 29 2023 at 18:00):

I tried to distill this thread into https://github.com/bytecodealliance/wasmtime/issues/6926 - please feel free to chime in if there is anything incorrect or you have other thoughts. Otherwise it's probably better to consolidate the discussion there? Thanks for this!

Currently, on unix platforms, Wasmtime relies on a few signals to operate, here's a list of the ones I'm aware of: SIGSEGV: Used for bounds checking when static memory is used. AFAIK using dynamic ...

view this post on Zulip Notification Bot (Aug 29 2023 at 18:00):

Tyler Rockwood has marked this topic as resolved.


Last updated: Oct 23 2024 at 20:03 UTC