How to register/handle new hardware trap? · wasmtime

Can someone explain the general approach I would have to follow if I wanted to add a Trap to wasmtime that should/would be "triggered" by a runtime trap coming from ARM hardware extensions (this is not that important). I am interested in which sections in wasmtime I would have to add code. Thanks!

fitzgen (he/him) (Jul 11 2023 at 20:30):

so the hardware extension would deliver a signal of some sort that you want to turn into a wasm trap?

is this a fast path for an existing kind of check/trap that wasm already raises?

Fritz Rehde (Jul 11 2023 at 20:36):

Thanks for the help! It's regarding https://www.kernel.org/doc/html/v5.12/arm64/memory-tagging-extension.html, which we are trying to add support for in wasmtime (it's part of a larger stack, probably not something that could be merged into wasmtime itself for now, it's more of an experiment/research project). I think wasmtime already supports arm64's pointer authentication instructions at least for preventing ROP-oriented attacks. Do you know if a special trap handler for whatever signal/trap PAC instructions "send" to wasmtime exists already? Maybe that could help in my implementation

Chris Fallin (Jul 11 2023 at 20:38):

I don't think we have anything to handle, it just falls into the "SIGSEGV turns into a wasm trap" bucket probably

Chris Fallin (Jul 11 2023 at 20:38):

Fritz Rehde (Jul 11 2023 at 21:11):

Yep, right now our MTE is also just falling into the SIGSEGV bucket and it's being labeled as a memory out of bounds access

Fritz Rehde (Jul 12 2023 at 14:28):

I am able to recognize when the MTE (hardware) trap occurs, by adding this code to traphandlers/unix.rs:

        let faulting_addr = match signum {
            libc::SIGSEGV | libc::SIGBUS => Some((*siginfo).si_addr() as usize),
            _ => None,
        };
        // end of previous code, beginning of my code

        // Add MTE error handling
        if signum == libc::SIGSEGV && (*siginfo).si_code == SEGV_MTESERR {
            // raise_lib_trap(Trap::MemoryTaggingExtensionFault);
            println!("found mte bug");
        }

but I am not sure how to pass this information onto the the trap handler. I have extended trap_encoding::Trap and TrapCode for the MTE fault, but I'm not sure how to pass instances of these to the respective handler. Any ideas/help?

Fritz Rehde (Jul 12 2023 at 14:30):

Ah you can see, I tried manually raising a lib trap, but that caused a seg fault, probably because the method is marked as highly unsafe and it doesn't "clean up" after itself, as stated in its documentation. So that is probably the wrong way to go.

Alex Crichton (Jul 12 2023 at 15:28):

Is the goal to get wasm opcodes that have a new Trap designation? If that's the case then that's done via other means, but yeah raise_lib_trap won't work in the signal handler

Alex Crichton (Jul 12 2023 at 15:29):

Cranelift emits metadata for all instructions in the form of "if this instruction traps it's this trap opcode", so it your goal is to get a new trap opcode then that's part of the compilation pipeline when generating the instruction that might trap

Alex Crichton (Jul 12 2023 at 15:29):

(e.g. it's already sigsegv and it'd be caught and recognized through normal conditions)

Fritz Rehde (Jul 12 2023 at 15:39):

My goal is to somehow signal to the user of wasmtime that an mte fault occurred (note that mte already works as expected, it's just the error message that this is about), instead of wasmtime printing wasm trap: out of bounds memory access. Instead, I would like it to say something like "an mte fault occured". As I mentionned in my previous message, I have been able to identify the MTE trap in traphandlers/unix.rs with the if signum == libc::SIGSEGV && (*siginfo).si_code == SEGV_MTESERR { line, but I'm not sure where I now need to add more code to get the user-facing error message I want. If I read your message correctly, do I have to add something to cranelift (as well)? Could you maybe point me to some locations where modifications would be necessary? Thanks for the help!

fitzgen (he/him) (Jul 12 2023 at 15:39):

FWIW, we pass unknown signals along to the next signal handler if we don't recognize it as originating from within wasm, we don't have an indiscriminate bucket per se.

Alex Crichton (Jul 12 2023 at 15:40):

If a single instruction can generate two kinds of traps then what you wrote in the signal handler will be required. If you're adding new instructions which only have one kind of trap, then this'll be done win Cranelift by registering traps at the right time during instruction emission

Alex Crichton (Jul 12 2023 at 15:41):

If threading things around is required you can follow the flow of fault_addr as it goes throughout the system and that could turn into something like:

enum TrapAux {
    Segv(usize),
    Mte,
    None,
}

Fritz Rehde (Jul 12 2023 at 16:16):

That's the tricky part, the MTE traps aren't generated by just one specific instruction. MTE works by tagging memory regions and pointers, and trapping if some memory access is performed where the tag of the pointer doesn't match the tag of the region. So it could be loads, stores etc. From https://www.kernel.org/doc/html/v5.12/arm64/memory-tagging-extension.html, I found

The kernel raises a SIGSEGV synchronously, with .si_code = SEGV_MTESERR and .si_addr = <fault-address>. The memory access is not performed.

, so I assumed the way to handle the MTE trap was to watch for a SIGSEGV and compare the si_code, which I did in the code snippet above. So basically I am trying to handle this MTE trap, and display a new, custom Trap to the user. My though process was: In the traphandlers/unix.rs, I can identify the MTE trap, so now I want to somehow "send" a custom Trap type to the whatever handler handles this kind of stuff. Is this possible?

Fritz Rehde (Jul 12 2023 at 16:18):

I thought the runtime was best suited to make my additions to the code, because the MTE trap only occurs at runtime, and can't be identified at compile time. Maybe I am wrong in that.

Alex Crichton (Jul 12 2023 at 16:21):

Ok that makes sense. In that case you'll want to, in the signal handling context as you are, determine that this is an MTE trap and then thread that through with the faulting_addr that's currently threaded everywhere. Then here you can process faulting_addr (which would be renamed to handle MTE stuff) in conjunction with the trap opcode. For example if the opcode says Trap::MemoryOutOfBounds but MTE was detected you'd change that to Trap::YourNewCustomTrapCode or something like that

Alex Crichton (Jul 12 2023 at 16:22):

the code I linked represents that a trap at a particular native code address was caught and the faulting_addr is sort of "optional context" form the original trap. That's then processed via cranelift-generated lookup tables to convert the pc to an opcode, and you'll be updating that to generate a new opcode

Fritz Rehde (Jul 12 2023 at 16:24):

I guess the part I am unclear about is then thread that through with the faulting_addr that's currently threaded everywhere. What do you mean with this? What does "threading" mean in this context?

Alex Crichton (Jul 12 2023 at 16:24):

Another way to put all this I think is that we're already catching MTE traps and what's necessary next is to plumb the metadata around to classify the trap as an MTE-related trap rather than an out of bounds trap because, by default, all memory-related instructions assume that a signal must mean the access was out of bounds (which is no longer true with MTE)

fitzgen (he/him) (Jul 12 2023 at 16:25):

wait but would these MTE traps get raised because a correctly implemented Wasm program attempted to do something it shouldn't do? (like access OOB memory for example) or would the trap get raised because of a bug in the runtime/compiler? If the latter then this should just hard kill wasmtime and we shouldn't generate custom Trap types and pass them around

Alex Crichton (Jul 12 2023 at 16:25):

oh so right now we record on segfaults not only the address of the faulting instruction but the address that was faulted on (e.g. you loaded from 0x000f00 or something like that -- this "context" of the faulting_addr needs to make its way from the signal handler into the rest of Wasmtime, and you'll need to shepherd along the MTE information alongside this other information

Fritz Rehde (Jul 12 2023 at 16:28):

@fitzgen (he/him) Ah, my bad for not clarifying earlier. The MTE-functionality we added is to increase memory safety of the wasm program we are executing with wasmtime. If we have a wasm program (that might have been compiled from unsafe C), then we want to use MTE to detect memory unsafe things like use-after-free or other memory-related bugs.

fitzgen (he/him) (Jul 12 2023 at 16:28):

Fritz Rehde (Jul 12 2023 at 16:31):

fitzgen (he/him) (Jul 12 2023 at 16:32):

okay yeah then you'd want to recognize when you get an MTE signal that is indeed from within Wasm and not because some other part of the host is also using MTE (can look at the offending PC) and then do all the stuff that Alex has been saying

Alex Crichton (Jul 12 2023 at 16:33):

that part I think is already handled because wasmtime only catches signals for instructions which are reigstered as being able to trap

fitzgen (he/him) (Jul 12 2023 at 16:33):

Alex Crichton (Jul 12 2023 at 16:34):

fitzgen (he/him) (Jul 12 2023 at 16:34):

Fritz Rehde (Jul 12 2023 at 16:50):

@Alex Crichton So, you're saying I should pass along ("thread") the MTE Trap as a TrapReason::Jit (I originally thought TrapReason::Wasm sounded more fitting, but not sure) by calling info.set_jit_trap(pc, fp, faulting_addr);, with the difference that I have to add my MTE information to that somehow (probably just adding a boolean argument)? Or do you mean I shouldn't change anything in traphandlers/unix.rs and just insert my MTE check in the from_runtime_box snippet you posted? Also, with opcode, you don't mean (*siginfo).si_code, which is what I need to check whether it's an MTE trap, right?

Alex Crichton (Jul 12 2023 at 16:53):

More-or-less, yes. You're right in that you're going to want to modify TrapReason::Jit. Currently that only has faulting_addr: Option<usize> and yeah you may want to add is_mte_fault: bool or something like that (sorry I don't know anything about MTE so I don't know what would be appropriate here). That would then make its way to from_runtime_box where you can convert a MemoryOutOfBounds trap into an MTE-specific trap depending on the state in TrapReason::Jit.

Alex Crichton (Jul 12 2023 at 16:53):

You'll need to inspect si_code to determine how to construct TrapReason::Jit still

Fritz Rehde (Jul 12 2023 at 17:04):

Ok, that makes sense, thank you for the help and time! One more thing: I'm looking at from_runtime_box right now, and I don't see any mention of anything related to MemoryOutOfBounds. Which enum that contains MemoryOutOfBounds do you mean? I'm not sure how to return my MTE trap error message to the user here. Is that done by returning a type Error here?

Alex Crichton (Jul 12 2023 at 17:07):

The code variable has type Trap which is likely storing Trap::MemoryOutOfBounds today for your MTE traps. This code is what you'll want to change to something MTE-related. By default all memory accesses, if they fault, report "memory out of bounds", which is why that's the case today

Fritz Rehde (Jul 12 2023 at 17:59):

Fritz Rehde (Jul 12 2023 at 20:05):

Follow-up: I know wasmtime already uses ARM's Pointer Authentication (PAC) for preventing ROP-oriented attacks. Do you "throw custom error messages", like I implemented for MTE, for PAC?

Alex Crichton (Jul 12 2023 at 20:06):

Not currently no because if a PAC error trips that's a critical compiler error which should take down the entire process. It's a defense-in-depth mechanism as opposed to a feature given to content to detect issues in-content

Fritz Rehde (Jul 12 2023 at 20:20):

I don't quite understand. Wasmtime's PAC support is for wasm guests/programs that wasmtime executes, right? PAC is also a runtime error that wasmtime might encounter/have to handle, similar to MTE. Which process do you mean when you say "taking down the process"? Wasmtime itself or the wasm guest?

Jamey Sharp (Jul 12 2023 at 20:24):

Since a wasm guest can only make calls and branches to safe targets, enforced during wasm validation, pointer authentication checks "can't" fail. If they fail anyway, that indicates we screwed up in the compiler, at which point all our safety guarantees are shot and we should fail really noisily. That's why Alex says the entire Wasmtime process should abort at that point.

Fritz Rehde (Jul 12 2023 at 20:31):

I am still slightly confused. Are pointer auth checks only done during compilation (or validation, though I'm not entirely sure what that is) in wasmtime? I don't understand why a PAC error/trap is a critical compiler error. In my understanding, the PAC instructions are for instance inserted to protect the return address. If this is somehow (some other vulnerability in the wasm code) overwritten by an attacker, then the PAC instruction would fail to authenticate the address, and, I think, crash/trap somehow. Are you saying the wasmtime process noisily aborts when encountering such PAC crashs/traps?

fitzgen (he/him) (Jul 12 2023 at 20:32):

it means the compiler successfully produced code, but by having a PAC failure, we determined that code was incorrect at runtime

fitzgen (he/him) (Jul 12 2023 at 20:33):

Fritz Rehde (Jul 12 2023 at 20:33):

Does this have something to do with Linux sending a SIGILL signal instead of SIGSEGV?

fitzgen (he/him) (Jul 12 2023 at 20:34):

the return address could be wrong either because of an attacker trying to do ROP (by leveraging a compiler or runtime bug) or because of a general bug with our compiler and the code it generates (as discussed above)

Jamey Sharp (Jul 12 2023 at 20:34):

A wasm guest should not be able to write anywhere that we have a pointer to code, such as writing to the native stack; if we allowed a stray write like that then we've already lost and the PAC failure is just detecting the bug sometime later.

fitzgen (he/him) (Jul 12 2023 at 20:35):

I personally have no idea why certain hardware features map to certain signals vs other signals. seems semi-arbitrary.

Fritz Rehde (Jul 12 2023 at 20:41):

Ok, thanks for the explanations. But how does it work in practice? Do you actually identify PAC errors/traps, and, when aborting wasmtime, provide some sort of error message to users? Or is such a PAC trap not identified by itself, and belongs to a larger group of traps/signals, that might all lead to aborting wasmtime?

Chris Fallin (Jul 12 2023 at 20:42):

The latter; we don't catch SIGSEGVs that do not map to expected points where wasm could semantically hit an error (e.g. out-of-bounds)

Chris Fallin (Jul 12 2023 at 20:42):

Chris Fallin (Jul 12 2023 at 20:43):

@Fritz Rehde to tie the above to some good search-phrases, Wasm has "CFI" (control-flow integrity); this is what implies the property that Jamey describes above, and makes PAC purely a defense-in-depth thing

Chris Fallin (Jul 12 2023 at 20:43):

we are able to guarantee CFI even on platforms without PAC (because Wasm semantics require it)

Fritz Rehde (Jul 12 2023 at 21:25):

Ok, interesting.
Say I wanted to/have added pointer authentication for a different purpose into wasmtime as well, not just preventing ROP attacks like wasmtime currently does. In my case, I wouldn't exactly consider a situation where a PAC trap is encountered at runtime to be a fault in cranelift itself, I would consider it a bug in the wasm guest code, just like MTE prevents buffer overflows, use after frees, which I also consider wasm guest errors, not errors in the cranelift compiler.
In this situation, if I possibly can, I would like to exit with an error similar to how I implemented with MTE. MTE was quite simple, I just had to compare a linux constant (SEGV_MTESERR) with the si_code. But I think/read online that identifying PAC traps/exceptions is more complicated. In my understanding, a PAC trap would cause a SIGILL, but that can mean many different problems, not necessarily a PAC error. However, I am not sure how I could continue from there. I read that PAC traps are asynchronous, meaning that by the time the signal handler receives the SIGILL signal, the program might have advanced beyond where the PAC error actually occured. So maybe an analysis to detect whether the encountered trap is a PAC trap is non-deterministic at best, and probably quite hard to implement.

Stream: wasmtime

Topic: How to register/handle new hardware trap?

Fritz Rehde (Jul 11 2023 at 20:14):

fitzgen (he/him) (Jul 11 2023 at 20:30):

Fritz Rehde (Jul 11 2023 at 20:36):

Chris Fallin (Jul 11 2023 at 20:38):

Chris Fallin (Jul 11 2023 at 20:38):

Fritz Rehde (Jul 11 2023 at 21:11):

Fritz Rehde (Jul 12 2023 at 14:28):

Fritz Rehde (Jul 12 2023 at 14:30):

Alex Crichton (Jul 12 2023 at 15:28):

Alex Crichton (Jul 12 2023 at 15:29):

Alex Crichton (Jul 12 2023 at 15:29):

Alex Crichton (Jul 12 2023 at 15:29):

Fritz Rehde (Jul 12 2023 at 15:39):

fitzgen (he/him) (Jul 12 2023 at 15:39):

Alex Crichton (Jul 12 2023 at 15:40):

Alex Crichton (Jul 12 2023 at 15:41):

Fritz Rehde (Jul 12 2023 at 16:16):

Fritz Rehde (Jul 12 2023 at 16:18):

Alex Crichton (Jul 12 2023 at 16:21):

Alex Crichton (Jul 12 2023 at 16:22):

Fritz Rehde (Jul 12 2023 at 16:24):

Alex Crichton (Jul 12 2023 at 16:24):

fitzgen (he/him) (Jul 12 2023 at 16:25):

Alex Crichton (Jul 12 2023 at 16:25):

Fritz Rehde (Jul 12 2023 at 16:28):

fitzgen (he/him) (Jul 12 2023 at 16:28):

Fritz Rehde (Jul 12 2023 at 16:31):

fitzgen (he/him) (Jul 12 2023 at 16:32):

Alex Crichton (Jul 12 2023 at 16:33):

fitzgen (he/him) (Jul 12 2023 at 16:33):

Alex Crichton (Jul 12 2023 at 16:34):

Alex Crichton (Jul 12 2023 at 16:34):

fitzgen (he/him) (Jul 12 2023 at 16:34):

Fritz Rehde (Jul 12 2023 at 16:50):

Alex Crichton (Jul 12 2023 at 16:53):

Alex Crichton (Jul 12 2023 at 16:53):

Fritz Rehde (Jul 12 2023 at 17:04):

Alex Crichton (Jul 12 2023 at 17:07):

Fritz Rehde (Jul 12 2023 at 17:59):

Fritz Rehde (Jul 12 2023 at 20:05):

Alex Crichton (Jul 12 2023 at 20:06):

Fritz Rehde (Jul 12 2023 at 20:20):

Jamey Sharp (Jul 12 2023 at 20:24):

Fritz Rehde (Jul 12 2023 at 20:31):

fitzgen (he/him) (Jul 12 2023 at 20:32):

fitzgen (he/him) (Jul 12 2023 at 20:33):

Fritz Rehde (Jul 12 2023 at 20:33):

fitzgen (he/him) (Jul 12 2023 at 20:34):

Jamey Sharp (Jul 12 2023 at 20:34):

fitzgen (he/him) (Jul 12 2023 at 20:35):

Fritz Rehde (Jul 12 2023 at 20:41):

Chris Fallin (Jul 12 2023 at 20:42):

Chris Fallin (Jul 12 2023 at 20:42):

Chris Fallin (Jul 12 2023 at 20:43):

Chris Fallin (Jul 12 2023 at 20:43):

Fritz Rehde (Jul 12 2023 at 21:25):