Hi all,
I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.
An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g. void __checkpoint(bool stop)
).
ii) WASM runtime: a stripped down version of iwasm
with the additional __checkpoint
native symbol.
iii) The stripped down version of iwasm
would allow a new command line argument --restore
or --restore-from-file
where iwasm
restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than __checkpoint
.
Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in the exec_env
defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in the exec_env
, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to __checkpoint
.
Thanks in advance for your help!
[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
@Carlos This is an exciting idea! Thanks for providing the informational links to related projects. Alougth I see using the project https://github.com/checkpoint-restore/criu can help to transfer the whole runtime process, I understand you implementing a simlar function for Wasm without using criu (not envolving kernel), right?
Where is the point for resuming from new environment? the code line after calling __check_point?
We may have a online meeting for sharing the idea and related knowledge quickly. It is really a great idea, we would be happy to support you.
Hi @Wang Xin thanks for the quick reply and for showing interest!
Exactly, CRIU checkpoints the whole Linux process (or process tree) which means that it would checkpoint the whole iwasm
process. And, as you guessed, it involves a lot of interaction with the kernel to dump all the process state.
Our goal is to implement this at the WebAssembly module level. Our WASM runtime using WAMR may have multiple WASM modules running in the same Linux process, so we are interested in a solution that can checkpoint just one module and not the other ones.
Where is the point for resuming from new environment? the code line after calling __check_point?
Yes!
We may have a online meeting for sharing the idea and related knowledge quickly. It is really a great idea, we would be happy to support you.
That would be great! I am in GMT and have a fairly flexible schedule.
Great! We can find a time in the next week. Could you please leave your email address for sending invitation? we will prepare a few digrams for the current memory model of WAMR. @Carlos
@Chris Woods @Thomas Trenner not sure if you are interested in this topic.
Sounds good @Wang Xin !
I should be available any time 9AM-6PM GMT Mon-Fri, so feel free to pick the slot that works best for you.
My email is: carlossegarragonzalez@gmail.com
Hello all! I'm also interested in having a look at these diagrams of the current memory model of WAMR, and listening to the talk. :)
Do you guys mind if I join as well? I have a similar availability to Carlos, so you can just email me at james.menetrey@unine.ch.
Cheers!
@Jämes Ménétrey sure, we will keep you invited! We will also publish the memory model diagrams. We wanted to do it long time ago but never really spend time on it.
Wang Xin schrieb:
Great! We can find a time in the next week. Could you please leave your email address for sending invitation? we will prepare a few digrams for the current memory model of WAMR. Carlos
Chris Woods Thomas Trenner not sure if you are interested in this topic.
In general: Everything that helps in diagnosing and observing wasm modules is interesting from our point of view. But, as said for the GPIO/I2C/SPI API discussions, as of now we are not in the situation that we can contribute a lot for it.
Carlos said:
Hi all,
I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.
An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g.void __checkpoint(bool stop)
).
ii) WASM runtime: a stripped down version ofiwasm
with the additional__checkpoint
native symbol.
iii) The stripped down version ofiwasm
would allow a new command line argument--restore
or--restore-from-file
whereiwasm
restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than__checkpoint
.Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in theexec_env
defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in theexec_env
, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to__checkpoint
.Thanks in advance for your help!
[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
Hi, I'm Yiwei Yang, Ph.D. student from UCSC, we as a group is doing heterogenous live migration based on WAMR and has already implemented a working Interpreter and WIP JIT migration. Looking forward to have collaboration!
Yiwei Yang said:
Carlos said:
Hi all,
I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.
An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g.void __checkpoint(bool stop)
).
ii) WASM runtime: a stripped down version ofiwasm
with the additional__checkpoint
native symbol.
iii) The stripped down version ofiwasm
would allow a new command line argument--restore
or--restore-from-file
whereiwasm
restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than__checkpoint
.Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in theexec_env
defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in theexec_env
, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to__checkpoint
.Thanks in advance for your help!
[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.mdHi, I'm Yiwei Yang, Ph.D. student from UCSC, we as a group is doing heterogenous live migration based on WAMR and has already implemented a working Interpreter and WIP JIT migration. Looking forward to have collaboration!
Our final artifact and result are listed here https://github.com/Multi-V-VM/MVVM.
Last updated: Dec 23 2024 at 12:05 UTC