wasmtime pulley and opcode info/dwarf · wasmtime

Hey! we're researching for wasm interpreters which are easy to patch as part of our recording/debugging tools. Wasmtime has a permissive Apache2 License, and the pulley intrerpreter: many interpreters don't really propagate info about the original wasm binary offsets of the instructions to their internal IR/interpreter loops;
this is a problem for us, as we want to match them with their DWARF info: does pulley propagate/preserve this?

bjorn3 (Apr 10 2025 at 16:29):

Yes, it does, but the information is only useful for traps and debuginfo. Pulley internally follows basically the same compilation pipeline as when compiling for native architectures. It uses the same code to lower wasm to Cranelift IR, optimizes this using the regular Cranelift optimization passes (which can merge clif ir instructions from multiple wasm instructions together, losing information about which actual wasm instruction was executed) and only at the end compiles to Pulley bytecode rather than conventional machine code. If you only need it for debuginfo, this is obviously fine, but if you also want to manipulate execution or have a guaranteed 100% correct extraction of the full wasm-level state, this won't cut it.

fitzgen (he/him) (Apr 10 2025 at 16:30):

DWARF isn't enabled for pulley because we haven't defined a DWARF register mapping. it also isn't clear to me what value it would provide since the DWARF is usually used by the system's native tools like gdb and perf but those don't apply to pulley.

we do have address maps that we use internally to map trapping instructions to source locations:

$ wasmtime compile -D address-map=y --target pulley64 ~/scratch/foo.wat -o ~/scratch/foo.cwasm

$ wasmtime objdump --addrmap ~/scratch/foo.cwasm
wasm[0]::function[0]:
            push_frame
            ╰─╼ trap: StackOverflow
            vconst128 v0, 32768
            ╰─╼ addrmap: 0x28
            call2 x0, x0, 0x9    // target = 0x1e
            ╰─╼ addrmap: 0x3a
            pop_frame
            ╰─╼ addrmap: 0x3c
            ret

wasm[0]::function[1]:
            push_frame
            ╰─╼ trap: StackOverflow
            vwidenlow8x16_s v5, v0
            ╰─╼ addrmap: 0x41
            vwidenhigh8x16_s v6, v0
            vaddpairwisei16x8_s v0, v5, v6
            pop_frame
            ╰─╼ addrmap: 0x43
            ret

can you share more about your goals here? patching pulley bytecode is not something that is supported. pulley is very low level, basically the same as an actual ISA like x86 or riscv64, so patching the bytecode will very easily lead to wild unsafety. any attempt to patch it under the covers is pretty scary. additionally, the bytecode is usually mapped read-only from disk, and so any attempt to write to it will trap.

bjorn3 (Apr 10 2025 at 16:32):

The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole. The gdbstub crate allows implementing a gdbstub by implementing a bunch of traits.

fitzgen (he/him) (Apr 10 2025 at 16:33):

bjorn3 (Apr 10 2025 at 16:34):

This is also how for example qemu and valgrind allow debugging emulated cq instrumented code as if the original code was running rather than their jitted code.

Alexander Ivanov (Apr 10 2025 at 17:48):

Alexander Ivanov (Apr 10 2025 at 17:49):

Alexander Ivanov (Apr 10 2025 at 17:50):

Alexander Ivanov (Apr 10 2025 at 17:51):

the goal is to produce full execution traces for some wasm-based programs for our CodeTracer environment

Alexander Ivanov (Apr 10 2025 at 17:51):

so that's where the DWARF info is useful: it lets us map the executed steps easily to high level code, and to know where certain local variables are, etc

Alexander Ivanov (Apr 10 2025 at 17:53):

currently, from the interpreter impls we've tried to look at, we've found mostly the wazero interpreter to propagate the original wasm binary instruction offsets to their final representation, because they seem to use DWARF in limited cases

Alexander Ivanov (Apr 10 2025 at 17:56):

btw years ago, bjorn has been answering me a similar question for the rust/llvm call instrumentation hooks here in zulip, it really seems the debugging world is small

bjorn3 (Apr 10 2025 at 18:09):

The way I got involved with rustc is through writing the rustc_codegen_cranelift rustc backend, which as the name says depends on Cranelift, just like Wasmtime does.

bjorn3 (Apr 10 2025 at 18:15):

@Alexander Ivanov Assuming you are referring to https://github.com/metacraft-labs/codetracer, once it supports loading rr traces, everything should work with native compilation rather than Pulley if you enable debuginfo generation in Wasmtime and you implement the gdb jit protocol in CodeTracer to get the DWARF debuginfo generated by Wasmtime out of the recorded trace.

Alexander Ivanov (Apr 10 2025 at 18:32):

yes: the rr traces are somewhat supported internally,just in a proprietary separate backend; (and yes: this backend is not stable yet)

Alexander Ivanov (Apr 10 2025 at 18:33):

they do have some tradeoffs compared to the "db-backend"-based one-s, but the needs of some clients are related to producing specifically this kind of open source-based db trace mechanism

Alexander Ivanov (Apr 10 2025 at 18:34):

Alexander Ivanov (Apr 10 2025 at 18:36):

btw we tried a more weird approach for the db backend: just running the natively compiled wasm under lldb, and producing such a trace(however that's extremely slow indeed: but it's a valuable xp in assessing that kind of dwarf for natively compiled wasm indeed: it works well)

Alexander Ivanov (Apr 10 2025 at 18:39):

Alexander Ivanov (Apr 10 2025 at 18:41):

ok, i have to look at the source i guess: I assume this might be sufficient, if they do indeed preserve the original offset/equivalent info at the final interpreter point

Chris Fallin (Apr 10 2025 at 18:50):

Note that we don't guarantee that every original Wasm opcode has a corresponding location in the final interpreter bytecode: Pulley's compilation is an optimizing one, which means that we may hoist code, GVN it, DCE it, etc. If you're looking to build a record/replay infrastructure on top of this at the Wasm virtual machine level, it's probably not what you want

Chris Fallin (Apr 10 2025 at 18:50):

FWIW, we do have a roadmap in our debugging RFC (merged, but work hasn't really started yet) aiming at our own record/replay infrastructure, and the plan there is to use Winch, since it does preserve program-point-for-program-point

Alex Crichton (Apr 10 2025 at 18:59):

I'll note that this still requires gdb to have an understanding of the "native architecture", and afaik gdb doesn't understand wasm and definitely doesn't understand pulley, so this would be of limited use I think.

For this Wasmtime sort of and sort of doesn't have this info with Pulley. As others have mentioned this is the goal of the DWARF that Wasmtime emits. The wasm module itself has its own DWARF, which is wasm-relative, translated to native-relative DWARF. This has all the pitfalls Chris and Nick have mentioned of as an optimizing compiler we can't translate the wasm-dwarf 1:1 to native-dwarf.

Additionally though there is no implementation of mapping wasm-dwarf to pulley-dwarf because pulley has no meaning in dwarf. It probably wouldn't be too too hard to add this though! That would enable you to read the pulley-dwarf and then couple that with pulley interpreter state to go back to the high-level program. With the understanding of course that this would be lossy in the same way that the wasm-to-native-dwarf translation is lossy.

Alexander Ivanov (Apr 10 2025 at 20:09):

thank you! some level of optimizatation might be ok, if it preserves the original source for at least the final operands, even if some were left out

Alexander Ivanov (Apr 10 2025 at 20:10):

however i do assume it might not be done like that, if it wasn't specifically written with that in mind

Alex Crichton (Apr 10 2025 at 20:11):

Cranelift does have -Oopt-level=0 which disables most optimizations. While that still doesn't preserve precise 1:1 mapping with wasm instructions it can work better than -Oopt-level=2, the default

Alexander Ivanov (Apr 10 2025 at 20:12):

@Chris Fallin interesting: it seems winch is a compilator, so do you plan to do something like rr? recording and replaying only the outside effects?

Chris Fallin (Apr 10 2025 at 20:12):

Alexander Ivanov (Apr 10 2025 at 20:16):

@Alex Crichton i see, still going through that many layers does seem a bit harder to ensure that exact info gets mapped in the end; some interpreters directly implement/use a wasm reader and do simple transformations after that, maybe that's a bit closer to the specific usage

Alexander Ivanov (Apr 10 2025 at 20:16):

Chris Fallin (Apr 10 2025 at 20:17):

Right, the main thing to understand about Pulley is that it is not what one would first expect when imagining a "Wasm interpreter": it does not interpret Wasm bytecode; it interprets the result of our usual compilation pipeline, to a new ISA we've invented; so all the usual pitfalls of that re: observability will apply

Alexander Ivanov (Apr 10 2025 at 20:17):

our usecase is more specific currently, initially for certain kinds of wasm-based contracts: only by accident it seems it would be applicable to wasm programs overally , this wasn't entirely our initial focus

Alexander Ivanov (Apr 10 2025 at 20:19):

@Chris Fallin yes, makes sense, we're going through all kinds of runtimes to compare them in those aspects, learned a lot

Alexander Ivanov (Apr 10 2025 at 20:20):

rfcs/accepted/wasmtime-debugging.md at main · bytecodealliance/rfcs

RFC process for Bytecode Alliance projects. Contribute to bytecodealliance/rfcs development by creating an account on GitHub.

Chris Fallin (Apr 10 2025 at 20:21):

Alexander Ivanov (Apr 10 2025 at 20:22):

Alexander Ivanov (Apr 10 2025 at 20:24):

in our current case we need to produce a different kind of record though, with different trade-offs, but it would be great to try with some real wasm programs

Alexander Ivanov (Apr 10 2025 at 20:25):

Alexander Ivanov (Apr 10 2025 at 20:26):

that's a very good observation, haven't thought about this case enough: we assumed that DWARF contains correct info for locals location

Alexander Ivanov (Apr 10 2025 at 20:27):

but i assume what is meant here, is that certain implementations can obviously store them in a different way in optimized mode

Alexander Ivanov (Apr 10 2025 at 20:30):

our plan was to simply match the interpreter pc with the wasm binary offset and this with DWARF, and to produce a trace based on combination of interpreter state and the debuginfo

Alexander Ivanov (Apr 10 2025 at 20:32):

there are some things we'd need a bit later, for which an interpreter is a better fit than adapting a native binary/the compilation

bjorn3 (Apr 10 2025 at 20:45):

If you do the same DWARF translation as for native you only need Pulley support. I guess you could pretend that the target arch is actually riscv and then map the pulley registers to riscv registers. Disassembling obviously won't work, you need to do the same remapping in the debuginfo and you will need to force usage of "hardware" breakpoints as opposed to software breakpoints (as the latter would try to insert riscv trap instructions at the breakpoint location), but everything else should work I think.

Stream: wasmtime

Topic: wasmtime pulley and opcode info/dwarf

Alexander Ivanov (Apr 10 2025 at 16:14):

bjorn3 (Apr 10 2025 at 16:29):

fitzgen (he/him) (Apr 10 2025 at 16:30):

bjorn3 (Apr 10 2025 at 16:32):

fitzgen (he/him) (Apr 10 2025 at 16:33):

bjorn3 (Apr 10 2025 at 16:34):

Alexander Ivanov (Apr 10 2025 at 17:48):

Alexander Ivanov (Apr 10 2025 at 17:49):

Alexander Ivanov (Apr 10 2025 at 17:50):

Alexander Ivanov (Apr 10 2025 at 17:51):

Alexander Ivanov (Apr 10 2025 at 17:51):

Alexander Ivanov (Apr 10 2025 at 17:53):

Alexander Ivanov (Apr 10 2025 at 17:56):

bjorn3 (Apr 10 2025 at 18:09):

bjorn3 (Apr 10 2025 at 18:15):

Alexander Ivanov (Apr 10 2025 at 18:32):

Alexander Ivanov (Apr 10 2025 at 18:33):

Alexander Ivanov (Apr 10 2025 at 18:34):

Alexander Ivanov (Apr 10 2025 at 18:36):

Alexander Ivanov (Apr 10 2025 at 18:39):

Alexander Ivanov (Apr 10 2025 at 18:41):

Chris Fallin (Apr 10 2025 at 18:50):

Chris Fallin (Apr 10 2025 at 18:50):

Alex Crichton (Apr 10 2025 at 18:59):

Alexander Ivanov (Apr 10 2025 at 20:09):

Alexander Ivanov (Apr 10 2025 at 20:10):

Alex Crichton (Apr 10 2025 at 20:11):

Alexander Ivanov (Apr 10 2025 at 20:12):

Chris Fallin (Apr 10 2025 at 20:12):

Alexander Ivanov (Apr 10 2025 at 20:16):

Alexander Ivanov (Apr 10 2025 at 20:16):

Chris Fallin (Apr 10 2025 at 20:17):

Alexander Ivanov (Apr 10 2025 at 20:17):

Alexander Ivanov (Apr 10 2025 at 20:19):

Alexander Ivanov (Apr 10 2025 at 20:20):

Chris Fallin (Apr 10 2025 at 20:21):

Alexander Ivanov (Apr 10 2025 at 20:22):

Alexander Ivanov (Apr 10 2025 at 20:22):

Alexander Ivanov (Apr 10 2025 at 20:24):

Alexander Ivanov (Apr 10 2025 at 20:25):

Alexander Ivanov (Apr 10 2025 at 20:26):

Alexander Ivanov (Apr 10 2025 at 20:27):

Alexander Ivanov (Apr 10 2025 at 20:30):

Alexander Ivanov (Apr 10 2025 at 20:32):

bjorn3 (Apr 10 2025 at 20:45):

Alex Crichton (Apr 10 2025 at 20:46):