Stream: git-wasmtime

Topic: wasmtime / issue #2210 Module unloading and Store object ...


view this post on Zulip Wasmtime GitHub notifications bot (May 05 2022 at 16:48):

alexcrichton labeled issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference to B held by A. The red-colored arrows signify references held by a table. The Store and the references held by a Store are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

As a result of this scheme, there must be a reference to all modules that should exist for the lifetime of the program. For example, if two modules are loaded and the other is only referenced through a table, it will be unloaded automatically immediately. As a potential solution, the store could hold strong references that the embedder can remove. Alternatively, the host would be required to hold a list of strong references to any modules that should not be unloaded.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables, and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.


Last updated: Nov 22 2024 at 16:03 UTC