Stream: wamr

Topic: checkpoint-restore


view this post on Zulip Carlos (Mar 15 2023 at 18:00):

Hi all,

I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.

An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g. void __checkpoint(bool stop)).
ii) WASM runtime: a stripped down version of iwasm with the additional __checkpoint native symbol.
iii) The stripped down version of iwasm would allow a new command line argument --restore or --restore-from-file where iwasm restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than __checkpoint.

Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in the exec_env defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in the exec_env, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to __checkpoint.

Thanks in advance for your help!

[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md

Checkpoint/Restore tool. Contribute to checkpoint-restore/criu development by creating an account on GitHub.
Contribute to mmathys/wasmsr development by creating an account on GitHub.
WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Wang Xin (Mar 15 2023 at 23:25):

@Carlos This is an exciting idea! Thanks for providing the informational links to related projects. Alougth I see using the project https://github.com/checkpoint-restore/criu can help to transfer the whole runtime process, I understand you implementing a simlar function for Wasm without using criu (not envolving kernel), right?

Where is the point for resuming from new environment? the code line after calling __check_point?

We may have a online meeting for sharing the idea and related knowledge quickly. It is really a great idea, we would be happy to support you.

Checkpoint/Restore tool. Contribute to checkpoint-restore/criu development by creating an account on GitHub.

view this post on Zulip Carlos (Mar 16 2023 at 09:20):

Hi @Wang Xin thanks for the quick reply and for showing interest!

Exactly, CRIU checkpoints the whole Linux process (or process tree) which means that it would checkpoint the whole iwasm process. And, as you guessed, it involves a lot of interaction with the kernel to dump all the process state.

Our goal is to implement this at the WebAssembly module level. Our WASM runtime using WAMR may have multiple WASM modules running in the same Linux process, so we are interested in a solution that can checkpoint just one module and not the other ones.

Where is the point for resuming from new environment? the code line after calling __check_point?

Yes!

We may have a online meeting for sharing the idea and related knowledge quickly. It is really a great idea, we would be happy to support you.

That would be great! I am in GMT and have a fairly flexible schedule.

view this post on Zulip Wang Xin (Mar 16 2023 at 12:53):

Great! We can find a time in the next week. Could you please leave your email address for sending invitation? we will prepare a few digrams for the current memory model of WAMR. @Carlos

@Chris Woods @Thomas Trenner not sure if you are interested in this topic.

view this post on Zulip Carlos (Mar 16 2023 at 13:00):

Sounds good @Wang Xin !

I should be available any time 9AM-6PM GMT Mon-Fri, so feel free to pick the slot that works best for you.

My email is: carlossegarragonzalez@gmail.com

view this post on Zulip Jämes Ménétrey (Mar 16 2023 at 15:44):

Hello all! I'm also interested in having a look at these diagrams of the current memory model of WAMR, and listening to the talk. :)

Do you guys mind if I join as well? I have a similar availability to Carlos, so you can just email me at james.menetrey@unine.ch.

Cheers!

view this post on Zulip Wang Xin (Mar 16 2023 at 23:22):

@Jämes Ménétrey sure, we will keep you invited! We will also publish the memory model diagrams. We wanted to do it long time ago but never really spend time on it.

view this post on Zulip Thomas Trenner (Mar 20 2023 at 12:46):

Wang Xin schrieb:

Great! We can find a time in the next week. Could you please leave your email address for sending invitation? we will prepare a few digrams for the current memory model of WAMR. Carlos

Chris Woods Thomas Trenner not sure if you are interested in this topic.

In general: Everything that helps in diagnosing and observing wasm modules is interesting from our point of view. But, as said for the GPIO/I2C/SPI API discussions, as of now we are not in the situation that we can contribute a lot for it.

view this post on Zulip Yiwei Yang (Jun 03 2023 at 05:10):

Carlos said:

Hi all,

I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.

An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g. void __checkpoint(bool stop)).
ii) WASM runtime: a stripped down version of iwasm with the additional __checkpoint native symbol.
iii) The stripped down version of iwasm would allow a new command line argument --restore or --restore-from-file where iwasm restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than __checkpoint.

Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in the exec_env defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in the exec_env, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to __checkpoint.

Thanks in advance for your help!

[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md

Hi, I'm Yiwei Yang, Ph.D. student from UCSC, we as a group is doing heterogenous live migration based on WAMR and has already implemented a working Interpreter and WIP JIT migration. Looking forward to have collaboration!

view this post on Zulip Yiwei Yang (Oct 13 2024 at 13:28):

Yiwei Yang said:

Carlos said:

Hi all,

I am looking into implementing some sort of checkpoint-restore functionality with WAMR for WebAssembly modules, akin to what CRIU [1] does for Linux processes. This functionality would allow to (i) take a snapshot (or checkpoint) of WAMR's execution environment, (ii) serialise it , and (iii) restore the module's execution environment from the snapshot.

An initial proof-of-concept would consist of the following:
i) WASM App: a simple counter written in C that sleeps for one second and prints the counter value.
ii) Initially, and to simplify the implementation, at a hardcoded iteration (e.g. 5) the program calls a native symbol (e.g. void __checkpoint(bool stop)).
ii) WASM runtime: a stripped down version of iwasm with the additional __checkpoint native symbol.
iii) The stripped down version of iwasm would allow a new command line argument --restore or --restore-from-file where iwasm restores a module from its serialised state.
iv) This functionality should work for both the classic interpreter and AOT. For classic interpreter, we have already done something similar for the wazero runtime [2].
Note that this PoC, deliberately, does no external calls other than __checkpoint.

Before getting too deep into the implementation rabbit hole, I wanted to post this message here to gather some ideas into how to implement the outlined PoC. My questions are:
i) Is all the execution state (assuming no external calls) contained in the exec_env defined in [3]?
ii) Is the answer to i) the same for AoT and Interpreter mode?
iii) Assuming all execution state is contained in the exec_env, would a deep copy suffice to checkpoint all the state (modulo some masking of areas that don't change e.g. funcs).
iv) What would be the best way to restore execution from where we left off? (Assuming, initially, only one call to __checkpoint.

Thanks in advance for your help!

[1] https://github.com/checkpoint-restore/criu
[2] https://github.com/mmathys/wasmsr
[3] https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md

Hi, I'm Yiwei Yang, Ph.D. student from UCSC, we as a group is doing heterogenous live migration based on WAMR and has already implemented a working Interpreter and WIP JIT migration. Looking forward to have collaboration!

Our final artifact and result are listed here https://github.com/Multi-V-VM/MVVM.

Seemlessly Migrate Process without boundary. Contribute to Multi-V-VM/MVVM development by creating an account on GitHub.

Last updated: Oct 23 2024 at 20:03 UTC