Stream: git-wasmtime

Topic: wasmtime / Issue #2210 Module unloading and Store object ...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 11:35):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime - the removal of individual parts of a module is not necessary.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed.
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm.
Cycles of references can exist through the Wasm Table, which means that cycles would need to be detected and all of the objects in the cycle freed if the cycle is not referenced from elsewhere.
The removal of objects could be done automatically immediately when possible or it could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.

If a module uses dynamic steps, such as memory allocation, during the instantiation of the module (for example when calling the start function right after instantiation step), then the a function that deallocates that memory should be called before unloading the module. This requires the ability to trigger a function when a module is unloaded. The function can be called automatically by the runtime or it can be called by the embedder just before unloading the module through the new functionality. The deallocation function seems like it should potentially be a part of Wasm specification if it is not already.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefits of this approach is that the implementation of if may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function to allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand it allows the embedder to have some control on when the collection should occur.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 11:37):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime - the removal of individual parts of a module is not necessary.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed.

If a module uses dynamic steps, such as memory allocation, during the instantiation of the module (for example when calling the start function right after instantiation step), then the a function that deallocates that memory should be called before unloading the module. This requires the ability to trigger a function when a module is unloaded. The function can be called automatically by the runtime or it can be called by the embedder just before unloading the module through the new functionality. The deallocation function seems like it should potentially be a part of Wasm specification if it is not already.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm.
Cycles of references can exist through the Wasm Table, which means that cycles would need to be detected and all of the objects in the cycle freed if the cycle is not referenced from elsewhere.
The removal of objects could be done automatically immediately when possible or it could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function to allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand it allows the embedder to have some control on when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefits of this approach is that the implementation of if may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 11:37):

Rochet2 commented on Issue #2210:

I have now updated the issue with more information on the requirements and intent.
I also added more information on the required implementation and more alternatives.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 13 2020 at 14:52):

tschneidereit commented on Issue #2210:

Thank you for the detailed description of the feature and its implications, @Rochet2. It does seem to me that the handling of references is really the sticky bit here. There might not be too much upside to making module unloading work in the absence of references, and the semantics of references to require them to stay valid.

I know that there's been some work on cross-store references, and perhaps the answer here might be to extend linking in a way that allows for hierarchies with multiple stores instead of trying to unload modules from a single store?

I have to confess that my understanding of all this is somewhat superficial, but I know that @lukewagner, @fitzgen, @alexcrichton, and others have thought about this more extensively.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 16 2020 at 15:33):

alexcrichton commented on Issue #2210:

Indeed thanks for the report! Technically there is a way to unload a module right now which is to deallocate a Store and all objects that refer to it, but that's not a great solution in the long-term and I agree that we should be instead ideally be enabling some form of a persistent Store.

Otherwise though unfortunately reference counting won't work due to the cycle problem you pointed out with functions being able to cross instances and create cycles through tables. Fixing that I think would require some form of GC (probably mark/sweep for us). We do already have a Store::gc method, however, for reference types that we could piggy-back onto if we ever add this.

What @tschneidereit may also work depending on the use case. We haven't implemented cross-store references today but the thinking is that you could link modules between two stores in a way that they can be easily disconnected. For example if one Store goes away then calling its functions from another Store would simply trap (or something like that)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 16 2020 at 19:16):

fitzgen commented on Issue #2210:

We've talked about stacking nested stores on top of each other to allow for pushing and popping sets of instances rather than the current all-or-nothing model. We could maybe do the same with Engines and Modules (although now there are two stacks that have relationships with each other, which is losing a bit of the stack-y elegance)? This doesn't give full expressibility of adding and removing modules/instances in any order though.

Semi-aside: We've always wanted to support use cases where a full GC isn't desired. At the same time, certain Wasm proposals will require a full tracing GC. When we support those proposals, we might as well also have the GC manage things like modules and instances and stores, etc... to allow for the kind of flexibility requested in this issue. However, it isn't clear to me what we will do when that Wasm feature is not enabled and in situations where a full tracing GC isn't desired. Are we going to forever maintain two versions of the embedder API that you can choose between with cargo features? One tracing GC'd and one ref counted? Or just forever disallow the flexibility requested in OP and only use ref counting?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 22 2020 at 20:12):

Rochet2 commented on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedded freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack. Any references to memories or tables would be possible also without creating any cycles as long as the call stack cannot refer to itself.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 01:39):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedded freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack. Any references to memories or tables would be possible also without creating any cycles as long as the call stack cannot refer to itself.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 01:49):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack. Any references to memories or tables would be possible also without creating any cycles as long as the call stack cannot refer to itself.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 01:56):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime - the removal of individual parts of a module is not necessary.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 01:58):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 01:59):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack. Any references to memories or tables would be possible also without creating any cycles as long as the call stack cannot refer to itself.

I have now updated the main post with this same information.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:01):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference to B held by A. The red-colored arrows signify references held by a table. The Store and the references held by a Store are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables, and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:05):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference to B held by A. The red-colored arrows signify references held by a table. The Store and the references held by a Store are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack. Any references to memories or tables would be possible also without creating any cycles as long as the call stack cannot refer to itself.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables, and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:06):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference to B held by A. The red-colored arrows signify references held by a table. The Store and the references held by a Store are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables, and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:06):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed. In such an implementation all references held by a Store would probably be weak references also. As a result, the store would no longer be a key element in keeping the created elements alive.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

I have now updated the main post with this same information.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:19):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

As a result of this scheme, there must be a reference to all modules that should exist for the lifetime of the program. For example, if two modules are loaded and the other is only referenced through a table, it will be unloaded automatically immediately. As a potential solution, the store could hold strong references that the embedder can remove. Alternatively, the host would be required to hold a list of strong references to any modules that should not be unloaded.

I have now updated the main post with this same information.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:20):

Rochet2 edited Issue #2210:

<!-- Please try to describe precisely what you would like to do in
Cranelift/Wasmtime and/or expect from it. You can answer the questions below if
they're relevant and delete this text before submitting. Thanks for opening an
issue! -->

Feature

<!-- What is the feature or code improvement you would like to do in
Cranelift/Wasmtime? -->
The ability to unload a module, which practically removes objects from Store.
The specification notes:

In practice, implementations may apply techniques like garbage collection to remove objects from the store that are no longer referenced. However, such techniques are not semantically observable, and hence outside the scope of this specification.

From this I infer that the implementing or not implementing such a feature and the details of the implementation are left for the runtime. I imagine that future proposals can affect the implementation this feature. The feature seems to be required in the long term.

Related topics and links:

Benefit

<!-- What is the value of adding this in Cranelift/Wasmtime? -->
Currently, Wasm modules can be linked together, but there is no way to unload modules completely. As a result, programs that would require loading modules for temporary use or to conserve memory will "leak memory" as time goes on and eventually the program will run into issues with memory limitations. Removal of objects that are no longer referenced from anywhere would free the memory of those unused objects. The main goal is to be able to unload an entire module once the module's structures are no longer referenced or the references are removed through the host by the embedder or runtime.

Implementation

<!-- Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? -->
To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference to B held by A. The red-colored arrows signify references held by a table. The Store and the references held by a Store are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

One approach is to use reference counting of all objects in a store. With the counting, objects that are no longer referenced can be freed. Cycles of references can exist through the Wasm Table, which requires additional handling.

The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

As a result of this scheme, there must be a reference to all modules that should exist for the lifetime of the program. For example, if two modules are loaded and the other is only referenced through a table, it will be unloaded automatically immediately. As a potential solution, the store could hold strong references that the embedder can remove. Alternatively, the host would be required to hold a list of strong references to any modules that should not be unloaded.

Alternatives

<!-- Have you considered alternative implementations? If so, how are they
better or worse than your proposal? -->
Another approach is to use garbage collection algorithms, such as a mark and sweep algorithm, which can handle cycles in references. The removal of objects could be invoked by the runtime itself periodically or when needed. The collection could be configured by or left to the embedder to invoke.
It seems that WAVM has implemented a GC function that can be called by the embedder. On the surface, it looks like a mark and sweep approach, but I am unsure. https://github.com/WAVM/WAVM/blob/530f33cd30c6ea5114a227175b3a7b0af77cadaa/Lib/Runtime/ObjectGC.cpp#L252
The function allows garbage collection of unused modules and objects, but it looks like it could only be invoked when the host has control. On the other hand, it allows the embedder to have some control over when the collection should occur.

An alternative to automatic GC and memory management schemes is to provide the embedder the ability to directly attempt to unload a given module or all parts of a module individually. The module's memories, tables, and other structures would then be freed. This would require the embedder to ensure that no references to the module or its parts exist or the references should be handled by the runtime either by removing them or by handling them gracefully when used.
The benefit of this approach is that the implementation of it may be simpler and more effective. However, the potential references that modules may have are of concern. Either the embedder is trusted or a mechanism to find and potentially remove existing references or raise an error when one is found should be implemented.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 02:26):

Rochet2 edited a comment on Issue #2210:

To help with reasoning about how an unloading system would be implemented I created this graph where two modules share different resources, including table, memory, global. There is also a call stack that refers to the functions of modules.
The modules own only functions and only refer to other resources. The arrow from A to B signifies a reference held by A to B. The red-colored arrows signify references held by a table. The store and the references from a store to basically all elements are omitted.
![WasmUnloading](https://user-images.githubusercontent.com/468816/99915524-e0d75d00-2d0c-11eb-8275-d5d1063cc143.png)

If a GC is not desired, then reference counting could be implemented. The issue of cyclic references could be relatively simply solved by using weak references in tables. If all of the red arrows in the graph are weak references, then there are no strong references that could create a cycle, apart from the ones created between the host. However, the cycles with the host will be resolved by the embedder freeing them when the modules or other resources are no longer needed.

If an element is freed and it is pointed to by a weak reference, then the weak reference could be unset at that time or when it is attempted to be used. However, testing if a reference is valid on every use could cause unnecessary (negligible?) overhead. I am unsure of how the execution is implemented, but I assume that the call stack would have a reference at least to the function in the stack.

As a result of this scheme, there must be a reference to all modules that should exist for the lifetime of the program. For example, if two modules are loaded and the other is only referenced through a table, it will be unloaded automatically immediately. As a potential solution, the store could hold strong references that the embedder can remove. Alternatively, the host would be required to hold a list of strong references to any modules that should not be unloaded.

EDIT: I have now updated the main post with this same information.
EDIT2: Addressed an issue related to automatically unloading a module when only referenced through a table

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 03:17):

Rochet2 commented on Issue #2210:

I know that there's been some work on cross-store references, and perhaps the answer here might be to extend linking in a way that allows for hierarchies with multiple stores instead of trying to unload modules from a single store?

thinking is that you could link modules between two stores in a way that they can be easily disconnected. For example, if one Store goes away then calling its functions from another Store would simply trap (or something like that)

Hmm, I have not personally considered or seen discussions yet on multiple stores. Probably need to take a look at that. The first thought is that it could potentially pave the way to nano processes sharing data efficiently.

However, to me, it would seem to create complexity and I am unsure what the benefit of that would be exactly without knowing more. If two stores can share a reference to basically any element of another store, then it would be the same as having a single store in my mind as references between all elements must still be handled one way or another.

Per the above suggestions, any invalid references could trap or be set to a null reference instead of pointing to invalid unloaded parts of the system. Unsure if there are some lifetime restrictions that references in a table should impose on the Wasm structures according to the specification. I could not directly find any.

stacking nested stores on top of each other to allow for pushing and popping sets of instances
This doesn't give full expressibility of adding and removing modules/instances in _any_ order though.

From the perspective of a system that would require an unloading mechanism as a part of its operation regularly, it would probably be quite hindering if in order to gain memory to load a new module it would need to free an entire stack of modules. Such a runtime design would probably affect the design of the modules themselves I would imagine.

Semi-aside:
We've always wanted to support use cases where a full GC isn't desired
certain Wasm proposals will require a full tracing GC

If a GC is required by Wasm, then it should probably be implemented. If a GC is not desired in a use case it could be disabled/not used/never invoked. Proposals that _might_ require a GC would probably need to be thought thoroughly through in case they don't really require one - such as this particular issue on unloading modules.

However, if a GC is not desired even as a part of the implementation when not desired in a use case then that probably imposes large changes to the runtime. Would probably need some examples of such use cases to think it through.


Last updated: Oct 23 2024 at 20:03 UTC