Hey! we're researching for wasm interpreters which are easy to patch as part of our recording/debugging tools. Wasmtime has a permissive Apache2 License, and the pulley intrerpreter: many interpreters don't really propagate info about the original wasm binary offsets of the instructions to their internal IR/interpreter loops;
this is a problem for us, as we want to match them with their DWARF info: does pulley propagate/preserve this?
Yes, it does, but the information is only useful for traps and debuginfo. Pulley internally follows basically the same compilation pipeline as when compiling for native architectures. It uses the same code to lower wasm to Cranelift IR, optimizes this using the regular Cranelift optimization passes (which can merge clif ir instructions from multiple wasm instructions together, losing information about which actual wasm instruction was executed) and only at the end compiles to Pulley bytecode rather than conventional machine code. If you only need it for debuginfo, this is obviously fine, but if you also want to manipulate execution or have a guaranteed 100% correct extraction of the full wasm-level state, this won't cut it.
DWARF isn't enabled for pulley because we haven't defined a DWARF register mapping. it also isn't clear to me what value it would provide since the DWARF is usually used by the system's native tools like gdb and perf but those don't apply to pulley.
we do have address maps that we use internally to map trapping instructions to source locations:
$ wasmtime compile -D address-map=y --target pulley64 ~/scratch/foo.wat -o ~/scratch/foo.cwasm
$ wasmtime objdump --addrmap ~/scratch/foo.cwasm
wasm[0]::function[0]:
push_frame
╰─╼ trap: StackOverflow
vconst128 v0, 32768
╰─╼ addrmap: 0x28
call2 x0, x0, 0x9 // target = 0x1e
╰─╼ addrmap: 0x3a
pop_frame
╰─╼ addrmap: 0x3c
ret
wasm[0]::function[1]:
push_frame
╰─╼ trap: StackOverflow
vwidenlow8x16_s v5, v0
╰─╼ addrmap: 0x41
vwidenhigh8x16_s v6, v0
vaddpairwisei16x8_s v0, v5, v6
pop_frame
╰─╼ addrmap: 0x43
ret
but this is really an internal implementation detail.
can you share more about your goals here? patching pulley bytecode is not something that is supported. pulley is very low level, basically the same as an actual ISA like x86 or riscv64, so patching the bytecode will very easily lead to wild unsafety. any attempt to patch it under the covers is pretty scary. additionally, the bytecode is usually mapped read-only from disk, and so any attempt to write to it will trap.
DWARF isn't enabled for pulley because we haven't defined a DWARF register mapping. it also isn't clear to me what value it would provide since the DWARF is usually used by the system's native tools like
gdbandperfbut those don't apply to pulley.
The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole. The gdbstub crate allows implementing a gdbstub by implementing a bunch of traits.
interesting, TIL!
This is also how for example qemu and valgrind allow debugging emulated cq instrumented code as if the original code was running rather than their jitted code.
@bjorn3 thanks, this makes sense
@fitzgen (he/him) hey, you're one of the gimli guys!
yes: we don't want to patch the bytecode, but the interpreter itself
the goal is to produce full execution traces for some wasm-based programs for our CodeTracer environment
so that's where the DWARF info is useful: it lets us map the executed steps easily to high level code, and to know where certain local variables are, etc
currently, from the interpreter impls we've tried to look at, we've found mostly the wazero interpreter to propagate the original wasm binary instruction offsets to their final representation, because they seem to use DWARF in limited cases
btw years ago, bjorn has been answering me a similar question for the rust/llvm call instrumentation hooks here in zulip, it really seems the debugging world is small
Alexander Ivanov said:
btw years ago, bjorn has been answering me a similar question for the rust/llvm call instrumentation hooks here in zulip, it really seems the debugging world is small
The way I got involved with rustc is through writing the rustc_codegen_cranelift rustc backend, which as the name says depends on Cranelift, just like Wasmtime does.
@Alexander Ivanov Assuming you are referring to https://github.com/metacraft-labs/codetracer, once it supports loading rr traces, everything should work with native compilation rather than Pulley if you enable debuginfo generation in Wasmtime and you implement the gdb jit protocol in CodeTracer to get the DWARF debuginfo generated by Wasmtime out of the recorded trace.
yes: the rr traces are somewhat supported internally,just in a proprietary separate backend; (and yes: this backend is not stable yet)
they do have some tradeoffs compared to the "db-backend"-based one-s, but the needs of some clients are related to producing specifically this kind of open source-based db trace mechanism
otherwise what you're saying is possible indeed
btw we tried a more weird approach for the db backend: just running the natively compiled wasm under lldb, and producing such a trace(however that's extremely slow indeed: but it's a valuable xp in assessing that kind of dwarf for natively compiled wasm indeed: it works well)
we do have address maps that we use internally to map trapping instructions to source locations:
ok, i have to look at the source i guess: I assume this might be sufficient, if they do indeed preserve the original offset/equivalent info at the final interpreter point
Note that we don't guarantee that every original Wasm opcode has a corresponding location in the final interpreter bytecode: Pulley's compilation is an optimizing one, which means that we may hoist code, GVN it, DCE it, etc. If you're looking to build a record/replay infrastructure on top of this at the Wasm virtual machine level, it's probably not what you want
FWIW, we do have a roadmap in our debugging RFC (merged, but work hasn't really started yet) aiming at our own record/replay infrastructure, and the plan there is to use Winch, since it does preserve program-point-for-program-point
bjorn3 said:
The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole.
I'll note that this still requires gdb to have an understanding of the "native architecture", and afaik gdb doesn't understand wasm and definitely doesn't understand pulley, so this would be of limited use I think.
Alexander Ivanov said:
so that's where the DWARF info is useful: it lets us map the executed steps easily to high level code, and to know where certain local variables are, etc
For this Wasmtime sort of and sort of doesn't have this info with Pulley. As others have mentioned this is the goal of the DWARF that Wasmtime emits. The wasm module itself has its own DWARF, which is wasm-relative, translated to native-relative DWARF. This has all the pitfalls Chris and Nick have mentioned of as an optimizing compiler we can't translate the wasm-dwarf 1:1 to native-dwarf.
Additionally though there is no implementation of mapping wasm-dwarf to pulley-dwarf because pulley has no meaning in dwarf. It probably wouldn't be too too hard to add this though! That would enable you to read the pulley-dwarf and then couple that with pulley interpreter state to go back to the high-level program. With the understanding of course that this would be lossy in the same way that the wasm-to-native-dwarf translation is lossy.
thank you! some level of optimizatation might be ok, if it preserves the original source for at least the final operands, even if some were left out
however i do assume it might not be done like that, if it wasn't specifically written with that in mind
Cranelift does have -Oopt-level=0 which disables most optimizations. While that still doesn't preserve precise 1:1 mapping with wasm instructions it can work better than -Oopt-level=2, the default
@Chris Fallin interesting: it seems winch is a compilator, so do you plan to do something like rr? recording and replaying only the outside effects?
Yes, exactly, that's the plan
@Alex Crichton i see, still going through that many layers does seem a bit harder to ensure that exact info gets mapped in the end; some interpreters directly implement/use a wasm reader and do simple transformations after that, maybe that's a bit closer to the specific usage
@Chris Fallin interesting, i'll search for it
Right, the main thing to understand about Pulley is that it is not what one would first expect when imagining a "Wasm interpreter": it does not interpret Wasm bytecode; it interprets the result of our usual compilation pipeline, to a new ISA we've invented; so all the usual pitfalls of that re: observability will apply
our usecase is more specific currently, initially for certain kinds of wasm-based contracts: only by accident it seems it would be applicable to wasm programs overally , this wasn't entirely our initial focus
@Chris Fallin yes, makes sense, we're going through all kinds of runtimes to compare them in those aspects, learned a lot
I assume that's the record/replay & debugging doc: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-debugging.md
Yep, that's the one
interesting
a re-execution strategy works well with wasm indeed
in our current case we need to produce a different kind of record though, with different trade-offs, but it would be great to try with some real wasm programs
We will need to either maintain a mapping from locals to register/stack slot for all Wasm PC points that we might hit a breakpoint or watchpoint, or force winch to unconditionally spill locals to the stack. The latter would greatly simplify tracking local locations, while the former would greatly increase the amount of context we would need to pass into the utility function for inspecting the execution state.
that's a very good observation, haven't thought about this case enough: we assumed that DWARF contains correct info for locals location
but i assume what is meant here, is that certain implementations can obviously store them in a different way in optimized mode
our plan was to simply match the interpreter pc with the wasm binary offset and this with DWARF, and to produce a trace based on combination of interpreter state and the debuginfo
there are some things we'd need a bit later, for which an interpreter is a better fit than adapting a native binary/the compilation
Alex Crichton said:
bjorn3 said:
The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole.
I'll note that this still requires gdb to have an understanding of the "native architecture", and afaik gdb doesn't understand wasm and definitely doesn't understand pulley, so this would be of limited use I think.
If you do the same DWARF translation as for native you only need Pulley support. I guess you could pretend that the target arch is actually riscv and then map the pulley registers to riscv registers. Disassembling obviously won't work, you need to do the same remapping in the debuginfo and you will need to force usage of "hardware" breakpoints as opposed to software breakpoints (as the latter would try to insert riscv trap instructions at the breakpoint location), but everything else should work I think.
true!
Last updated: Dec 06 2025 at 06:05 UTC