thibaultcha opened Issue #1911:
Hello!
Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.
The use-case at hand is: in my host functions callbacks (
wasmtime_func_callback_with_env_t
), I'd like to retrieve some context values from my (event-driven) host application.Example
Below are some extracts from my embedding, which is using the Linker:
- With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, my_host_func, ptr, NULL); wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
- Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
- Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) { // Here, we can retrieve the instance and the `ptr` pointer // previously given to `wasmtime_func_new_with_env`. // But how can we retrieve more host context values that // were created *after* `wasmtime_func_new_with_env`? }
The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.
Thoughts
There were two solutions that I could think of:
- Stop using the Linker, create the instance via
wasmtime_instance_new()
and bind imports withwasmtime_func_new_with_env()
right after. This allows for some host context pointers to be given to the callbackvoid *env
argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_- If so, then sticking to the Linker sorts of makes
wasmtime_func_new_with_env
less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like:wasmtime_instance_env_set(wasm_instance_t *instance, void *env)
. This pointer would then be given to host function callbacks, in one of their arguments.What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!
Thanks in advance!
thibaultcha edited Issue #1911:
Hello!
Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.
The use-case at hand is: in my host functions callbacks (
wasmtime_func_callback_with_env_t
), I'd like to retrieve some context values from my (event-driven) host application.Example
Below are some extracts from my embedding, which is using the Linker:
- With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, &my_host_func, ptr, NULL); wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
- Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
- Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) { // Here, we can retrieve the instance and the `ptr` pointer // previously given to `wasmtime_func_new_with_env`. // But how can we retrieve more host context values that // were created *after* `wasmtime_func_new_with_env`? }
The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.
Thoughts
There were two solutions that I could think of:
- Stop using the Linker, create the instance via
wasmtime_instance_new()
and bind imports withwasmtime_func_new_with_env()
right after. This allows for some host context pointers to be given to the callbackvoid *env
argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_- If so, then sticking to the Linker sorts of makes
wasmtime_func_new_with_env
less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like:wasmtime_instance_env_set(wasm_instance_t *instance, void *env)
. This pointer would then be given to host function callbacks, in one of their arguments.What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!
Thanks in advance!
alexcrichton commented on Issue #1911:
For the first solution you have, I'm not sure if that will be possible since the instance's imports must exist before the instance is created, so you can't bind imports after instantiation. Also FWIW using
Linker
vs raw instances shouldn't have an impact on performance.In general though it sounds like this is where the
env
argument would be used? You could, for example, allocate some space which is filled in after instantiation, and pass that as theenv
. That way when the function is called it has access to the data stuffed intoenv
after initialization. You'd just need to be careful that if the function is called during initialization is reports an error of some form. (this is a fundamental wasm limitation, and is why thestart
function isn't the most useful)
thibaultcha commented on Issue #1911:
@alexcrichton Hi there,
For the first solution you have, I'm not sure if that will be possible since the instance's imports must exist before the instance is created, so you can't bind imports after instantiation.
Yes, in this case, the Instance is created within the host context, and I can pass the necessary pointers to
wasmtime_func_new_with_env()
.Also FWIW using Linker vs raw instances shouldn't have an impact on performance.
Even assuming that for each new Instance, we loop over its imports and call
wasmtime_func_new_with_env()
for each one of them?You could, for example, allocate some space which is filled in after instantiation, and pass that as the env. That way when the function is called it has access to the data stuffed into env after initialization.
I did not include this solution in my above reasoning because it significantly complicates an embedding's design. Doing so would mean that
env
needs to be a global R/B Tree which is maintained by the host application in which it has to stuff context values for each created Instance. Besides the extra cost associated with that, the host application now has to deal with maintaining global states for each Instance, adding associated state when the Instance is created, and removing it when it is deleted. This seems like a breaking of encapsulation to me and a rather poor design for an embedding. Instead of having to do all of this extra work, a host application could simply associate some arbitrary data with a given Instance right after having created it. This is _significantly_ simpler and leaner (a single line of code for associating data vs. having to maintain associated global structures on the side).
alexcrichton commented on Issue #1911:
FWIW there is a
wasm_instance_set_host_info
API to set arbitrary data on thewasm_instance_t
, but you don't have access to this as part ofwasm_func_call
. Additionally I'm not sure what you mean about looping over imports and callingwasmtime_func_new_with_env
for them? It seems like you do the same for raw instantiation vs linker-based instantiation?
thibaultcha commented on Issue #1911:
Thank you for continuing this discussion!
With Linker-based instantiation, I call
wasmtime_func_new_with_env()
_once_ for each host function during the initialization of my host program. Then, during runtime, I callwasmtime_linker_instantiate()
once for each execution context in which I will need an Instance.With raw instantiation, it seems like I would have to call
wasmtime_func_new_with_env()
for each host function for each Instance that I create.This is the source of my confusion as to whether Linker-based instantiation may be cheaper than raw instantiation. If the cost is similar, I can rely on the
env
argument ofwasmtime_func_new_with_env()
to keep track of host context values. If raw instantiation is significantly more expensive (computation and memory wise), then I would need another way to bound host context values to a given Instance.
thibaultcha commented on Issue #1911:
Digging into the Linker's code, I seem to notice that the associated cost would be somewhat identical. The difference seems to be whether the imports are resolved in the Linker's Rust code or in the host application's C code.
Because using the Linker could greatly reduce efforts from embedder, do you think that there is a need for host context values to be given to
wasmtime_linker_instantiate()
? Such given values could then be passed to the functions created by the Linker, and retrievable in thewasmtime_func_callback_with_env_t
callbacks in theirvoid *env
argument maybe?
alexcrichton commented on Issue #1911:
Er sorry I'm still not really sure what this issue is about. It's not clear to me what contextual information you're attaching where, and why the
env
pointer and/or theset_host_info
business aren't the right solution. I'm a bit confused by the discussion of performance, too, so perhaps that could be set aside for a bit to figure out the env business?
thibaultcha commented on Issue #1911:
I sure can elaborate and be more specific!
I am embedding wasmtime inside of the Nginx web server. Upon processing of an HTTP request, Nginx creates a
ngx_http_request_t
structure. What I want to do is to create a new instance for each suchngx_http_request_t
structure. The instance will then be called at various points during the processing of this HTTP request, until it is freed (at which point the instance will be freed as well).Now, when the instance associated to a request invokes an imported host function, I need to retrieve this
ngx_http_request_t
structure to know on which request the callback's logic should apply. Therefore, I need a way to bound an instance to a specific request context data in order to do this:wasm_trap_t* my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) { ngx_http_request_t *r = (ngx_http_request_t *) env; /* ... */ }
Now, because I am using the Linker in order to not have to implement my own imports resolver (which I previously did but would like to avoid if possible), I cannot associate this
ngx_http_request_t
pointer to a given instance created viawasmtime_linker_instantiate()
(unless I am missing something else).This issue is trying to solve this problem: how to retrieve host context values from a callback when instances are created via the Linker.
thibaultcha edited a comment on Issue #1911:
I sure can elaborate and be more specific!
I am embedding wasmtime inside of the Nginx web server. Upon processing of an HTTP request, Nginx creates a
ngx_http_request_t
structure. What I want to do is to create a new instance for each suchngx_http_request_t
structure. The instance will then be called at various points during the processing of this HTTP request, until it is freed (at which point the instance will be freed as well).Now, when the instance associated to a request invokes an imported host function, I need to retrieve this
ngx_http_request_t
structure to know on which request the callback's logic should apply. Therefore, I need a way to bound an instance to a specific request context data in order to do this:wasm_trap_t* my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) { ngx_http_request_t *r = (ngx_http_request_t *) env; /* ... */ }
Now, because I am using the Linker in order to not have to implement my own imports resolver (which I previously did but would like to avoid if possible), I cannot associate this
ngx_http_request_t
pointer to a given instance created viawasmtime_linker_instantiate()
(unless I am missing something else).This issue is trying to solve this problem: how to retrieve host context values from a callback when instances are created via the Linker.
alexcrichton commented on Issue #1911:
What you'll likely want to do in that case is:
- Have a global
wasm_engine_t
used by all requests- Cache a global
wasm_module_t
with the engine (using a temporary store to satisfy API requirements)- Create a
wasm_store_t
per-request using the global engine- Instantiate the global module within this store
The last step means you'll have to recreate functions for every request anyway, so you'll be able to pair the
env
parameter at that time.Does that make sense? It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.
thibaultcha commented on Issue #1911:
It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same
wasm_store_t
(maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker, itself tied to the request's store, and eventually creating the instance for me. Either way allows for pairing the
env
argument when creating the functions.
thibaultcha edited a comment on Issue #1911:
It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same
wasm_store_t
(maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) and eventually creating the instance for me. Either way allows for pairing the
env
argument when creating the functions at that time.
thibaultcha edited a comment on Issue #1911:
It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same
wasm_store_t
(maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the
env
argument when creating the functions at that time.
thibaultcha edited a comment on Issue #1911:
It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same
wasm_store_t
(maybe misunderstanding the underlying performance implications? I was hoping to save the extra allocation costs).So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the
env
argument when creating the functions at that time.
thibaultcha edited a comment on Issue #1911:
It sounds like you're caching the
wasm_linker_t
globally which may be causing my confusion.Hmm right, that's what I was doing since I was hoping that all instances could be backed by the same
wasm_store_t
(maybe misunderstanding the underlying performance implications? I was hoping to save the extra allocation costs).So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the
env
argument when creating the functions at that time.
alexcrichton commented on Issue #1911:
Yeah that's what I'm thinking, where if you instantiate per-request then you can create imports paired with
env
arguments.For
wasm_store_t
the issue isn't so much about reusing allocations but rather ever freeing them. Nowasm_instance_t
is fully deallocated until the entirety of its store and all other references to the store have gone away. This is done because we don't have a full GC. If you're a long-runner server (e.g. nginx) you probably want to keep memory usage under control, so you'll likely want to have a store-per-request so when the request is finished you'll be able to free all memory associated with the request.
thibaultcha commented on Issue #1911:
Thank you for the
wasm_store_t
clarifications! Alright then, this discussion answers the questions that I had on retrieving host context values and a clarifies a few more points; I'll go ahead and close this issue now.Thank you for your time @alexcrichton, much appreciated.
thibaultcha closed Issue #1911:
Hello!
Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.
The use-case at hand is: in my host functions callbacks (
wasmtime_func_callback_with_env_t
), I'd like to retrieve some context values from my (event-driven) host application.Example
Below are some extracts from my embedding, which is using the Linker:
- With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, &my_host_func, ptr, NULL); wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
- Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
- Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) { // Here, we can retrieve the instance and the `ptr` pointer // previously given to `wasmtime_func_new_with_env`. // But how can we retrieve more host context values that // were created *after* `wasmtime_func_new_with_env`? }
The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.
Thoughts
There were two solutions that I could think of:
- Stop using the Linker, create the instance via
wasmtime_instance_new()
and bind imports withwasmtime_func_new_with_env()
right after. This allows for some host context pointers to be given to the callbackvoid *env
argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_- If so, then sticking to the Linker sorts of makes
wasmtime_func_new_with_env
less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like:wasmtime_instance_env_set(wasm_instance_t *instance, void *env)
. This pointer would then be given to host function callbacks, in one of their arguments.What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!
Thanks in advance!
Last updated: Dec 23 2024 at 13:07 UTC