Serialize suspended fiber · wasmtime

Stream: wasmtime

Topic: Serialize suspended fiber

Will Noble (Feb 11 2025 at 00:02):

I'm just starting to investigate the feasibility of this cooked scheme and thought I'd check if anybody's already done this or thought about it at all.

Let's say the host has asynchronously invoked a component function that runs for a very long time, routinely being suspended and resumed with epoch-based interruption. At some point, the host decides to shut down the instance (e.g. because the machine the host is running on would like to shut down), but persist the running fiber to disk in a way that it could be later re-loaded and resumed using an identical Engine and instance of the same component (possibly on a different machine) in a way that's completely transparent to the guest.

I'm assuming this would require heavily unsafe modifications to wasmtime, but it doesn't seem like it's even as crazy as some of the other things cranelift does. Anybody know of any factors that would make this truly impossible? Serializing FiberStack seems potentially straightforward enough. What else would have to be persisted and then re-loaded? Shared memories would also require some careful handling.

Thanks in advance for any insights!

Chris Fallin (Feb 11 2025 at 00:07):

Serializing FiberStack seems potentially straightforward enough.

This is fairly tricky: what you need is something like a "relocatable stack" (and relocatable register contents) where, when rehydrating the snapshot in a new process/context with different host memory addresses for stack and heap, we need to fix up the in-flight state appropriately. This boils down to having precise type information for all the pieces of address computations that can exist in the compiler IR, including intermediate VM data structures, that is sufficient to recompute them in a new context; or else not persisting them across suspend-points (which could be any function call).

See https://github.com/bytecodealliance/wasmtime/issues/3017 where we discussed some of this a few years ago. I don't think this is likely to be easily implementable, but someone who is sufficiently motivated with a deep enough understanding of the runtime and of Cranelift could probably do it in a few months' fulltime work

Jonas Kruckenberg (Feb 25 2025 at 16:03):

you do not want to touch stack rewriting with even a ten feet pole

Jonas Kruckenberg (Feb 25 2025 at 16:04):

i implemented a rudimentary version of this a while back for the physical to virtual memory transition of k23 but it’s uniquely cursed

Jonas Kruckenberg (Feb 25 2025 at 16:04):

i don’t think i had a worse time programming

Last updated: Apr 09 2025 at 10:04 UTC