Stream: wasmtime

Topic: wasmtime pulley and opcode info/dwarf


view this post on Zulip Alexander Ivanov (Apr 10 2025 at 16:14):

Hey! we're researching for wasm interpreters which are easy to patch as part of our recording/debugging tools. Wasmtime has a permissive Apache2 License, and the pulley intrerpreter: many interpreters don't really propagate info about the original wasm binary offsets of the instructions to their internal IR/interpreter loops;
this is a problem for us, as we want to match them with their DWARF info: does pulley propagate/preserve this?

view this post on Zulip bjorn3 (Apr 10 2025 at 16:29):

Yes, it does, but the information is only useful for traps and debuginfo. Pulley internally follows basically the same compilation pipeline as when compiling for native architectures. It uses the same code to lower wasm to Cranelift IR, optimizes this using the regular Cranelift optimization passes (which can merge clif ir instructions from multiple wasm instructions together, losing information about which actual wasm instruction was executed) and only at the end compiles to Pulley bytecode rather than conventional machine code. If you only need it for debuginfo, this is obviously fine, but if you also want to manipulate execution or have a guaranteed 100% correct extraction of the full wasm-level state, this won't cut it.

view this post on Zulip fitzgen (he/him) (Apr 10 2025 at 16:30):

DWARF isn't enabled for pulley because we haven't defined a DWARF register mapping. it also isn't clear to me what value it would provide since the DWARF is usually used by the system's native tools like gdb and perf but those don't apply to pulley.

we do have address maps that we use internally to map trapping instructions to source locations:

$ wasmtime compile -D address-map=y --target pulley64 ~/scratch/foo.wat -o ~/scratch/foo.cwasm

$ wasmtime objdump --addrmap ~/scratch/foo.cwasm
wasm[0]::function[0]:
            push_frame
            ╰─╼ trap: StackOverflow
            vconst128 v0, 32768
            ╰─╼ addrmap: 0x28
            call2 x0, x0, 0x9    // target = 0x1e
            ╰─╼ addrmap: 0x3a
            pop_frame
            ╰─╼ addrmap: 0x3c
            ret

wasm[0]::function[1]:
            push_frame
            ╰─╼ trap: StackOverflow
            vwidenlow8x16_s v5, v0
            ╰─╼ addrmap: 0x41
            vwidenhigh8x16_s v6, v0
            vaddpairwisei16x8_s v0, v5, v6
            pop_frame
            ╰─╼ addrmap: 0x43
            ret

but this is really an internal implementation detail.

can you share more about your goals here? patching pulley bytecode is not something that is supported. pulley is very low level, basically the same as an actual ISA like x86 or riscv64, so patching the bytecode will very easily lead to wild unsafety. any attempt to patch it under the covers is pretty scary. additionally, the bytecode is usually mapped read-only from disk, and so any attempt to write to it will trap.

view this post on Zulip bjorn3 (Apr 10 2025 at 16:32):

DWARF isn't enabled for pulley because we haven't defined a DWARF register mapping. it also isn't clear to me what value it would provide since the DWARF is usually used by the system's native tools like gdb and perf but those don't apply to pulley.

The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole. The gdbstub crate allows implementing a gdbstub by implementing a bunch of traits.

view this post on Zulip fitzgen (he/him) (Apr 10 2025 at 16:33):

interesting, TIL!

view this post on Zulip bjorn3 (Apr 10 2025 at 16:34):

This is also how for example qemu and valgrind allow debugging emulated cq instrumented code as if the original code was running rather than their jitted code.

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:48):

@bjorn3 thanks, this makes sense

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:49):

@fitzgen (he/him) hey, you're one of the gimli guys!

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:50):

yes: we don't want to patch the bytecode, but the interpreter itself

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:51):

the goal is to produce full execution traces for some wasm-based programs for our CodeTracer environment

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:51):

so that's where the DWARF info is useful: it lets us map the executed steps easily to high level code, and to know where certain local variables are, etc

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:53):

currently, from the interpreter impls we've tried to look at, we've found mostly the wazero interpreter to propagate the original wasm binary instruction offsets to their final representation, because they seem to use DWARF in limited cases

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 17:56):

btw years ago, bjorn has been answering me a similar question for the rust/llvm call instrumentation hooks here in zulip, it really seems the debugging world is small

view this post on Zulip bjorn3 (Apr 10 2025 at 18:09):

Alexander Ivanov said:

btw years ago, bjorn has been answering me a similar question for the rust/llvm call instrumentation hooks here in zulip, it really seems the debugging world is small

The way I got involved with rustc is through writing the rustc_codegen_cranelift rustc backend, which as the name says depends on Cranelift, just like Wasmtime does.

view this post on Zulip bjorn3 (Apr 10 2025 at 18:15):

@Alexander Ivanov Assuming you are referring to https://github.com/metacraft-labs/codetracer, once it supports loading rr traces, everything should work with native compilation rather than Pulley if you enable debuginfo generation in Wasmtime and you implement the gdb jit protocol in CodeTracer to get the DWARF debuginfo generated by Wasmtime out of the recorded trace.

CodeTracer is a user-friendly time-traveling debugger designed to support a wide range of programming languages. - metacraft-labs/codetracer

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:32):

yes: the rr traces are somewhat supported internally,just in a proprietary separate backend; (and yes: this backend is not stable yet)

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:33):

they do have some tradeoffs compared to the "db-backend"-based one-s, but the needs of some clients are related to producing specifically this kind of open source-based db trace mechanism

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:34):

otherwise what you're saying is possible indeed

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:36):

btw we tried a more weird approach for the db backend: just running the natively compiled wasm under lldb, and producing such a trace(however that's extremely slow indeed: but it's a valuable xp in assessing that kind of dwarf for natively compiled wasm indeed: it works well)

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:39):

we do have address maps that we use internally to map trapping instructions to source locations:

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 18:41):

ok, i have to look at the source i guess: I assume this might be sufficient, if they do indeed preserve the original offset/equivalent info at the final interpreter point

view this post on Zulip Chris Fallin (Apr 10 2025 at 18:50):

Note that we don't guarantee that every original Wasm opcode has a corresponding location in the final interpreter bytecode: Pulley's compilation is an optimizing one, which means that we may hoist code, GVN it, DCE it, etc. If you're looking to build a record/replay infrastructure on top of this at the Wasm virtual machine level, it's probably not what you want

view this post on Zulip Chris Fallin (Apr 10 2025 at 18:50):

FWIW, we do have a roadmap in our debugging RFC (merged, but work hasn't really started yet) aiming at our own record/replay infrastructure, and the plan there is to use Winch, since it does preserve program-point-for-program-point

view this post on Zulip Alex Crichton (Apr 10 2025 at 18:59):

bjorn3 said:

The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole.

I'll note that this still requires gdb to have an understanding of the "native architecture", and afaik gdb doesn't understand wasm and definitely doesn't understand pulley, so this would be of limited use I think.

Alexander Ivanov said:

so that's where the DWARF info is useful: it lets us map the executed steps easily to high level code, and to know where certain local variables are, etc

For this Wasmtime sort of and sort of doesn't have this info with Pulley. As others have mentioned this is the goal of the DWARF that Wasmtime emits. The wasm module itself has its own DWARF, which is wasm-relative, translated to native-relative DWARF. This has all the pitfalls Chris and Nick have mentioned of as an optimizing compiler we can't translate the wasm-dwarf 1:1 to native-dwarf.

Additionally though there is no implementation of mapping wasm-dwarf to pulley-dwarf because pulley has no meaning in dwarf. It probably wouldn't be too too hard to add this though! That would enable you to read the pulley-dwarf and then couple that with pulley interpreter state to go back to the high-level program. With the understanding of course that this would be lossy in the same way that the wasm-to-native-dwarf translation is lossy.

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:09):

thank you! some level of optimizatation might be ok, if it preserves the original source for at least the final operands, even if some were left out

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:10):

however i do assume it might not be done like that, if it wasn't specifically written with that in mind

view this post on Zulip Alex Crichton (Apr 10 2025 at 20:11):

Cranelift does have -Oopt-level=0 which disables most optimizations. While that still doesn't preserve precise 1:1 mapping with wasm instructions it can work better than -Oopt-level=2, the default

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:12):

@Chris Fallin interesting: it seems winch is a compilator, so do you plan to do something like rr? recording and replaying only the outside effects?

view this post on Zulip Chris Fallin (Apr 10 2025 at 20:12):

Yes, exactly, that's the plan

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:16):

@Alex Crichton i see, still going through that many layers does seem a bit harder to ensure that exact info gets mapped in the end; some interpreters directly implement/use a wasm reader and do simple transformations after that, maybe that's a bit closer to the specific usage

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:16):

@Chris Fallin interesting, i'll search for it

view this post on Zulip Chris Fallin (Apr 10 2025 at 20:17):

Right, the main thing to understand about Pulley is that it is not what one would first expect when imagining a "Wasm interpreter": it does not interpret Wasm bytecode; it interprets the result of our usual compilation pipeline, to a new ISA we've invented; so all the usual pitfalls of that re: observability will apply

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:17):

our usecase is more specific currently, initially for certain kinds of wasm-based contracts: only by accident it seems it would be applicable to wasm programs overally , this wasn't entirely our initial focus

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:19):

@Chris Fallin yes, makes sense, we're going through all kinds of runtimes to compare them in those aspects, learned a lot

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:20):

I assume that's the record/replay & debugging doc: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-debugging.md

RFC process for Bytecode Alliance projects. Contribute to bytecodealliance/rfcs development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Apr 10 2025 at 20:21):

Yep, that's the one

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:22):

interesting

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:22):

a re-execution strategy works well with wasm indeed

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:24):

in our current case we need to produce a different kind of record though, with different trade-offs, but it would be great to try with some real wasm programs

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:25):

We will need to either maintain a mapping from locals to register/stack slot for all Wasm PC points that we might hit a breakpoint or watchpoint, or force winch to unconditionally spill locals to the stack. The latter would greatly simplify tracking local locations, while the former would greatly increase the amount of context we would need to pass into the utility function for inspecting the execution state.

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:26):

that's a very good observation, haven't thought about this case enough: we assumed that DWARF contains correct info for locals location

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:27):

but i assume what is meant here, is that certain implementations can obviously store them in a different way in optimized mode

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:30):

our plan was to simply match the interpreter pc with the wasm binary offset and this with DWARF, and to produce a trace based on combination of interpreter state and the debuginfo

view this post on Zulip Alexander Ivanov (Apr 10 2025 at 20:32):

there are some things we'd need a bit later, for which an interpreter is a better fit than adapting a native binary/the compilation

view this post on Zulip bjorn3 (Apr 10 2025 at 20:45):

Alex Crichton said:

bjorn3 said:

The pulley interpreter could have a gdbstub to which you would attach gdb to as opposed to attaching gdb to the wasmtime process as a whole.

I'll note that this still requires gdb to have an understanding of the "native architecture", and afaik gdb doesn't understand wasm and definitely doesn't understand pulley, so this would be of limited use I think.

If you do the same DWARF translation as for native you only need Pulley support. I guess you could pretend that the target arch is actually riscv and then map the pulley registers to riscv registers. Disassembling obviously won't work, you need to do the same remapping in the debuginfo and you will need to force usage of "hardware" breakpoints as opposed to software breakpoints (as the latter would try to insert riscv trap instructions at the breakpoint location), but everything else should work I think.

view this post on Zulip Alex Crichton (Apr 10 2025 at 20:46):

true!


Last updated: Dec 06 2025 at 06:05 UTC