Stream: git-wasmtime

Topic: wasmtime / Issue #2210 Store object removal or garbage co...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 11:12):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
Garbage collection of Store objects that are no longer referenced.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Interestingly, the module linking proposal has a requirement not to use a GC. That requirement potentially affects this feature's implementation.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Garbage collection of objects that are no longer referenced from anywhere would free the memory of those unused objects.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which means that cycles would need to be detected and all of the objects in the cycle freed if the cycle is not referenced from elsewhere. The garbage collection would be done immediately when possible or invoked by the runtime itself periodically or when needed. Potentially the collection could be configured or left to the embedder to invoke.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function to allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand it allows the embedder to have some control on when the collection should occur. The addition of a function to the API for essentially basic functionality that is required in long term may not be something that is wanted as it may never be a part of a standard API for Wasm (for example C-API).

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 11:35):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime - the removal of individual parts of a module is not necessary.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed.
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm.
Cycles of references can exist through the Wasm Table, which means that cycles would need to be detected and all of the objects in the cycle freed if the cycle is not referenced from elsewhere.
The removal of objects could be done automatically immediately when possible or it could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.

If a module uses dynamic steps, such as memory allocation, during the instantiation of the module (for example when calling the start function right after instantiation step), then the a function that deallocates that memory should be called before unloading the module. This requires the ability to trigger a function when a module is unloaded. The function can be called automatically by the runtime or it can be called by the embedder just before unloading the module through the new functionality. The deallocation function seems like it should potentially be a part of Wasm specification if it is not already.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefits of this approach is that the implementation of if may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function to allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand it allows the embedder to have some control on when the collection should occur.


Last updated: Nov 22 2024 at 17:03 UTC