Stream: git-wasmtime

Topic: wasmtime / issue #5732 Support for Wasm Coredump


view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 11:58):

xtuc opened issue #5732:

Feature

When the Wasm instance traps, it's sometimes difficult to understand what happened. Post-mortem debugging using coredumps (which is extensively used in native environment) would be helpful for investigating and fixing crashes.

Wasm coredump is especially useful for serverless environment where production binaries are stripped and/or have access to limited logging.

Implementation

Implement Wasm coredumps as specified by https://github.com/WebAssembly/tool-conventions/blob/main/Coredump.md.
Note that the spec is early and subject to changes. Feedback very welcome!

cc @fitzgen

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 12:25):

bjorn3 commented on issue #5732:

Reading the linear memory after a crash is already possible. As for getting the locals and stack values, this is much more complicated. Wasmtime uses the Cranelift optimizing compiler, which can eliminate locals and stack values entirely and leaves those that remain at whichever location it likes. It did be necessary to somehow prevent optimizing locals away, at least for points where a trap could happen. There is debugger support for getting the location of locals and stack values which aren't optimized away to generate debuginfo, but I'm not sure if it is 100% accurate. By the way https://github.com/bytecodealliance/wasmtime/issues/5537 is somewhat relevant to this.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 15:19):

xtuc commented on issue #5732:

I don't think Wasm coredump should prevent optimizations, given that ideally it's enabled by default.

It's not uncommon to see coredump in native environment with missing values because they were optimized away. They are usually not very helpful for debugging.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 15:32):

bjorn3 commented on issue #5732:

The wasm coredump format doesn't seem to allow omitting values that are optimized away, but if it is allowed, then it should be possible to implement without too much changes to Cranelift. I think it would need some changes to the unwind table generation code to store the location of callee saved registers, but that will need to be done anyway for handling exceptions. After that I guess it would be a matter of telling Cranelift to generate debuginfo and then during a crash unwind the stack and record all preserved locals and stack values for every frame from Wasmtime.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 15:37):

xtuc commented on issue #5732:

The wasm coredump format doesn't seem to allow omitting values that are optimized away

Correct, at the moment it doesn't. I'm going to add it, thanks for your input!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 15:55):

jameysharp commented on issue #5732:

This is an area I haven't dug into much, but doesn't Cranelift's support for GC already support tracking the information we need for this? I think we would need to mark potentially-trapping instructions as "safe points" and then request stack maps from Cranelift. And my impression was that calls are already considered safe points. But this is all conjecture based on a CVE that I was peripherally paying attention to last year, so I could have it all wrong.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 18:19):

fitzgen commented on issue #5732:

This is an area I haven't dug into much, but doesn't Cranelift's support for GC already support tracking the information we need for this? I think we would need to mark potentially-trapping instructions as "safe points" and then request stack maps from Cranelift. And my impression was that calls are already considered safe points. But this is all conjecture based on a CVE that I was peripherally paying attention to last year, so I could have it all wrong.

Stack maps only track reference values (r32/r64), and only say which stack slots have live references in them. They do not supply any kind of info to help tie that back to wasm locals or even clif SSA variables.

I don't think we would want to use stack maps for this stuff.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 07 2023 at 18:29):

cfallin commented on issue #5732:

On the flip-side, if you're proposing altering the generated code to assist debugging observability @jameysharp, there is a large design space that we haven't really explored. A relatively simple change would be to define a pseudoinstruction that takes all locals as inputs, with "any" constraints to regalloc (stack slot or register), and insert these wherever a crash could happen. This "state snapshot" instruction would then guarantee observability of all values, at the cost of hindering optimization.

This goes somewhat against the "don't alter what you're observing" principle that is common in debug infrastructure but I'll note that we do already have some hacks to keep important values alive (in this case, the vmctx, which makes all other wasm state reachable) for the whole function body.

There's also the "recovery instruction" approach, used in IonMonkey at least: whenever a value is optimized out, generate a side-sequence of instructions that can recompute it. That's a much larger compiler-infrastructure undertaking but in principle we could do it, if perfect debug observability were a goal.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2023 at 12:07):

xtuc commented on issue #5732:

https://github.com/WebAssembly/tool-conventions/issues/198 has been closed. The coredump format now allows to make local/stack values as missing.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2023 at 12:07):

xtuc edited a comment on issue #5732:

https://github.com/WebAssembly/tool-conventions/issues/198 has been closed. The coredump format now allows to mark local/stack values as missing.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2023 at 19:04):

xtuc commented on issue #5732:

I made a change to add initial/basic coredump generation: https://github.com/bytecodealliance/wasmtime/pull/5868. Could you please have a look and let me know if this is the right direction?
It uses WasmBacktrace for information about frames.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 28 2023 at 23:18):

xtuc commented on issue #5732:

Basic coredump generation has been merged (thanks!).

Now, to have the complete debugger experience, we need to collect the following information:

From @cfallin :

A relatively simple change would be to define a pseudoinstruction that takes all locals as inputs, with "any" constraints to regalloc (stack slot or register), and insert these wherever a crash could happen. This "state snapshot" instruction would then guarantee observability of all values, at the cost of hindering optimization.

This "state snapshot" instruction could be translated from Wasm's (unreachable) instruction. I'm curious about the performance impact. Since the coredump feature is behind a flag, would it make sense to experiment with that approach?
Is my understanding correct that it won't help identifiying the values?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 01 2023 at 08:03):

xtuc edited a comment on issue #5732:

Basic coredump generation has been merged (thanks!).

Now, to have the complete debugger experience, we need to collect the following information:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 21:48):

RyanTorok commented on issue #5732:

Is there a chance we could revive this thread? I'm working on cloud
infrastructure research, and being able to take a stack snapshot in wasmtime
would allow us to get some sophisticated cold-start optimizations for
Function-as-a-Service (FaaS) functions.

There has been a plethora of academic papers published about using execution
snapshots to speed up the cold-start (startup) time in FaaS, especially when
heavyweight VMs are involved. Starting up a Module in wasmtime tends to be
faster than VMs by 2-3 orders of magnitude, but recent papers have also explored
how to snapshot the state of the function after some initialization runs, which
has a lot in common with what Wizer does.

I am trying to extend this idea with a construction called _Nondeterministic
Generators_, which will allow FaaS functions to be snaphotted at any point in the
execution. Generators rely on the observation that functions whose execution
that has not performed any invocation-specific computation (i.e. anything using
the function arguments or any nondeterministic functions imported from the host)
can be unconditionally snapshotted and used to fast-forward future invocations
of the same function.

In addition, we can create conditional snapshots that let application developers
optimize for common patterns, such as functions that want to check that their
arguments are valid before they perform their expensive initialization, which
traditional "init function"-based cold-start speedup techniques cannot optimize
without breaking the function semantics if the invocation-specific invariant is
violated (e.g. our argument validation fails).

I was looking into Wizer quite a bit and the design decisions it makes, and I
was hoping to get some insight about the requirements Wizer lists on its docs.rs
page, under "Caveats":

Is this just a lint against the produced module being potentially non-portable
(the snapshot would rely on the outcome of a particular host's implementation of
the imported function), or is there a more fundamental reason this is not
possible? I imagine my generator design having the potential to snapshot any
time just before a generator is polled (polling calls an import function, so the
host can record the outcome of the generator function), which would necessitate
snapshotting after code that has already called into the host at least once if we
have multiple generators.

I don't anticipate the application code running on my system to need any of
these, but I'd like some clarification about why this applies to the entire
module and not just the init function, like for host functions.

This makes sense. Application code in my system should not need to use these.

More fundamentally, the major roadblock to my design working with WebAssembly
modules is wasmtime's current inability to snapshot the WebAssembly _stack_. Since
my design allows the execution to snapshot at any point, not just after some
initialization function runs (as Wizer supports), my design would require all
the application's local state to be moved to a Memory before we snapshot, which
would slow down function execution and be a very awkward paradigm to program in.

My main question is (and I apologize for taking a page to get there), is what
roadblocks would need to be overcome in order to make stack snapshots possible
in wasmtime? Since it will be relevant below, I should point out that the
requirements for my use case are actually a bit looser than Wizer's in two ways:

I had the intuition that the application library could just run some WebAssembly
code that copies the locals on the stack into a Memory object, but I was
concerned about how wasmtime would behave when we restored such a stack. Unlike
the core-dumping use case, I'm less concerned about the actual contents of the
stack in relation to cranelift's dead-code elimination (DCE); however, I am
concerned that if during the run that produced the snapshot, cranelift decides
by DCE to eliminate an unnecessary value from the stack, is it possible that
when we restore that stack in a new instantiation of the module that skips to
the snapshot, cranelift won't perform the same optimization and it will try to
pop a value off the stack that isn't there? If I had one reason for writing this
comment, it's that I would really appreciate some clarification on how this
compilation process works and what guarantees are in place, and how that might
affect our endeavor to produce restorable stack snapshots.

Thanks everyone for reading. You all do great work, and I'd love to contribute
going forward.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 22:20):

bjorn3 commented on issue #5732:

Something that may work is if you reuse the exact same compiled machine code then you could take a snapshot of the part of the native stack that contains the wasm frames and restore it later. You did have to fixup pointers (which probably requires emitting extra metadata and maybe some changes to avoid keeping pointers alive across function calls) and making sure that no native frames are on the stack as those can't safely be snapshotted. By keeping the same compiled machine code you know that the stack layout is identical. Wasmtime already allows emitting compiled wasm modules (.cwasm extension) and loading them again. You did only need to implement the stack snapshotting and pointer fixups. This still not exactly trivial, but likely much easier than perfectly reconstructing the wasm vm state.

The initialization function may not call any imported functions. Doing so will
trigger a trap and wizer will exit.

I would guess this is a combination of there being no way to hook up any imported functions from the host to wizer and this limitation ensuring that there is no native state that wizer can't snapshot. But I'm not a contributor to it, so it is nothing but a guess.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 22:23):

cfallin commented on issue #5732:

@RyanTorok there are a lot of interesting ideas in your comment (I have to admit that I skimmed it in parts; I'd encourage a "tl;dr" of points for comments this long!). A few thoughts:

So I think some form of this is possible but it's a deep research project and requires a bunch of intimate knowledge of the compiler and runtime. We likely don't have the resources to help you design this in detail, but I'm personally curious to see what you come up with...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 22:26):

cfallin edited a comment on issue #5732:

@RyanTorok there are a lot of interesting ideas in your comment (I have to admit that I skimmed it in parts; I'd encourage a "tl;dr" of points for comments this long!). A few thoughts:

So I think some form of this is possible but it's a deep research project and requires a bunch of intimate knowledge of the compiler and runtime. We likely don't have the resources to help you design this in detail, but I'm personally curious to see what you come up with...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 22:27):

cfallin edited a comment on issue #5732:

@RyanTorok there are a lot of interesting ideas in your comment (I have to admit that I skimmed it in parts; I'd encourage a "tl;dr" of points for comments this long!). A few thoughts:

So I think some form of this is possible but it's a deep research project and requires a bunch of intimate knowledge of the compiler and runtime. We likely don't have the resources to help you design this in detail, but I'm personally curious to see what you come up with...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 17:13):

fitzgen commented on issue #5732:

@RyanTorok,

The Wasm stack doesn't really exist anymore by the time Cranelift is done emitting machine code (it is erased very early in the pipeline, basically the first thing to go). Instead you would need to capture the actual native stack. This has issues that @bjorn3 mentioned around native frames in between Wasm frames, but even if it is just Wasm there will be pointers on the stack to things malloced by the host, namely the vm context and associated data structures. Each new process will have new ASLR and new malloc allocations and new FaaS requests/invocations will have new stores (and their associated vm contexts). These structures will ultimately end up in different addresses in memory. So either (a) restoring a snapshot will require having a list of places to go and update pointers not dissimilar to relocs or a moving GC, or (b) take extreme care codegen only emit indirect references to these structures (somehow? need an actual handle to be the "root" at some point or else a host call or something). Option (a) is a ton of work for Wasmtime/Cranelift to keep track of these things and option (b) is also a ton of work but also makes Wasm execution speed much slower. In both cases, if we get anything wrong (miss a stack slot or register that has a native pointer when saving a snapshot or accidentally emit a direct pointer reference rather than an indirection) then we have security vulnerabilities. Supporting all this would be a large refactoring of much of Wasmtime and Cranelift, and I'm pessimistic that it would ever happen. This is the kind of thing that you ideally need to build in from the very start, and Wasmtime and Cranelift have not been built with this in mind.

Backing up a bit: this topic would be better discussed in a dedicated issue or on zulip, since this issue is specifically about implementing the proposed standard Wasm coredump format, which won't help with this feature since it is strictly about the Wasm-level. I suggest filing a new issue or starting a thread on zulip if you have further questions.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 21:01):

RyanTorok commented on issue #5732:

Thank you to everyone for the quick responses and insightful comments!

TL;DR: Issues with ASLR and the level of introspection into the runtime that would be required make stack snapshots pretty much a non-starter, and in fact they alerted me to limitations in the existing work on cold-starts I wasn't aware of.

Based on @fitzgen 's comments about ASLR, I took another look back at the existing literature on cold-starts, and it turns out that the traditional method of snapshotting the entire state of the VM or language runtime is not compatible with ASLR _at all_, and for the exact reason @fitzgen pointed out.

A summary of the problem is that language runtimes (e.g. JVM, Python, Node.js, wasmtime, ...) inherently need to compile code using native addresses, thereby making the VM state not portable to different addresses. Traditionally, the way to deal with this portability issue would be to introduce another level of indirection (i.e. position-independent addresses), but @fitzgen, @cfallin, and @bjorn3 all pointed out that any such scheme would require very deep introspection into the language runtime to convert the indirect addresses to direct addresses, which would be an enormous endeavor to the point you'd be better of redesigning the entire runtime to support this indirection. Otherwise, you're really walking a tightrope on both performance and security (mess up the indirection once, and the tenant can read memory their program doesn't own).

The existing literature on cold-starts essentially punts on this issue; it requires all memory owned by the VM or runtime to be loaded at the same address every time. While I don't see any major reasons wasmtime couldn't support this from an implementation standpoint, I don't recommend this as a direction for multiple reasons:

To summarize (in research paper speak), there are several open problems that have to be addressed with language runtimes in general, not just wasmtime, in order for generalized snapshots to be a practical solution for the cloud. I'm going to continue looking into how we might provide a subset of this feature set via library abstractions that work with the designs of existing language runtimes.

Thanks for all your help everyone!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 21:09):

RyanTorok commented on issue #5732:

As an aside, I think this question from my original comment:

is it possible that when we restore that stack in a new instantiation of the module that skips to the snapshot, cranelift won't perform the same optimization and it will try to pop a value off the stack that isn't there?

was a simple misunderstanding by me about the mechanics of cranelift. Clearly everything has to be compiled in order to run, it's just a matter of when that happens (AOT or JIT). My last project was in browser security, and in JavaScript engines we actually have to worry about code running at multiple optimization levels, and my confusion stemmed from there. This doesn't change anything about the issues with ASLR or introspection, however.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 21:14):

RyanTorok edited a comment on issue #5732:

Thank you to everyone for the quick responses and insightful comments!

TL;DR: Issues with ASLR and the level of introspection into the runtime that would be required make stack snapshots pretty much a non-starter, and in fact they alerted me to limitations in the existing work on cold-starts I wasn't aware of.

Based on @fitzgen 's comments about ASLR, I took another look back at the existing literature on cold-starts, and it turns out that the traditional method of snapshotting the entire state of the VM or language runtime is not compatible with ASLR _at all_, and for the exact reason @fitzgen pointed out.

A summary of the problem is that language runtimes (e.g. JVM, Python, Node.js, wasmtime, ...) inherently need to compile code using native addresses, thereby making the VM state not portable to different addresses. Traditionally, the way to deal with this portability issue would be to introduce another level of indirection (i.e. position-independent addresses), but @fitzgen, @cfallin, and @bjorn3 all pointed out that any such scheme would require very deep introspection into the language runtime to convert the indirect addresses to direct addresses, which would be an enormous endeavor to the point you'd be better of redesigning the entire runtime to support this indirection. Otherwise, you're really walking a tightrope on both performance and security (mess up the indirection once, and the tenant can read memory their program doesn't own).

The existing literature on cold-starts essentially punts on this issue; it requires all memory owned by the VM or runtime to be loaded at the same address every time. While I don't see any major reasons wasmtime couldn't support this from an implementation standpoint, I don't recommend this as a direction for multiple reasons:

To summarize (in research paper speak), there are several open problems that have to be addressed with language runtimes in general, not just wasmtime, in order for generalized snapshots to be a practical solution for the cloud. I'm going to continue looking into how we might provide a subset of this feature set via library abstractions that work with the designs of existing language runtimes.

Thanks for all your help everyone!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 21:15):

RyanTorok edited a comment on issue #5732:

Thank you to everyone for the quick responses and insightful comments!

TL;DR: Issues with ASLR and the level of introspection into the runtime that would be required make stack snapshots pretty much a non-starter, and in fact they alerted me to limitations in the existing work on cold-starts I wasn't aware of.

Based on @fitzgen 's comments about ASLR, I took another look back at the existing literature on cold-starts, and it turns out that the traditional method of snapshotting the entire state of the VM or language runtime is not compatible with ASLR _at all_, and for the exact reason @fitzgen pointed out.

A summary of the problem is that language runtimes (e.g. JVM, Python, Node.js, wasmtime, ...) inherently need to compile code using native addresses, thereby making the VM state not portable to different addresses. Traditionally, the way to deal with this portability issue would be to introduce another level of indirection (i.e. position-independent addresses), but @fitzgen, @cfallin, and @bjorn3 all pointed out that any such scheme would require very deep introspection into the language runtime to convert the indirect addresses to direct addresses, which would be an enormous endeavor to the point you'd be better of redesigning the entire runtime to support this indirection. Otherwise, you're really walking a tightrope on both performance and security (mess up the indirection once, and the tenant can read memory their program doesn't own).

The existing literature on cold-starts essentially punts on this issue; it requires all memory owned by the VM or runtime to be loaded at the same address every time. While I don't see any major reasons wasmtime couldn't support this from an implementation standpoint, I don't recommend this as a direction for multiple reasons:

To summarize (in research paper speak), there are several open problems that have to be addressed with language runtimes in general, not just wasmtime, in order for generalized snapshots to be a practical solution for the cloud. I'm going to continue looking into how we might provide a subset of this feature set via library abstractions that work with the designs of existing language runtimes.

Thanks for all your help everyone!

view this post on Zulip Wasmtime GitHub notifications bot (May 19 2024 at 22:09):

whitequark commented on issue #5732:

What tools can I use to inspect the coredumps?

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2024 at 21:19):

fitzgen commented on issue #5732:

@whitequark unfortunately there isn't much off-the-shelf at the moment.

There was https://github.com/xtuc/wasm-coredump/tree/main/bin/wasmgdb but as far as I know it only works with an old version of the format.

There are plans to build support for inspecting them via the debug adapter protocol in Wasmtime itself, as a stepping stone towards fuller debugging capabilities. See https://github.com/bytecodealliance/rfcs/pull/34 for more details. Unfortunately, that doesn't exist yet.

In the meantime, Wasm's core dumps are just wasm modules themselves, so you can use any tool that you might inspect a wasm module with to get at the information inside a core dump, e.g. wasm-tools print or wasm-objdump.

I know this isn't a great answer. I wish I had a better one. But we are planning on getting there!

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2024 at 21:50):

whitequark commented on issue #5732:

Thanks! I'll keep it in mind--I have to use wasm-objdump a lot already so, cursed as it is, this does fit into my workflow...

view this post on Zulip Wasmtime GitHub notifications bot (May 21 2024 at 09:06):

xtuc commented on issue #5732:

There was https://github.com/xtuc/wasm-coredump/tree/main/bin/wasmgdb but as far as I know it only works with an old version of the format.

Sorry about that. I'm planning to update wasmgdb to the latest spec but haven't had the time yet.


Last updated: Oct 23 2024 at 20:03 UTC