Stream: git-wasmtime

Topic: wasmtime / Issue #1911 C API: retrieve host context value...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 06:00):

thibaultcha opened Issue #1911:

Hello!

Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.

The use-case at hand is: in my host functions callbacks (wasmtime_func_callback_with_env_t), I'd like to retrieve some context values from my (event-driven) host application.

Example

Below are some extracts from my embedding, which is using the Linker:

  1. With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, my_host_func, ptr, NULL);
wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
  1. Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
  1. Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) {
  // Here, we can retrieve the instance and the `ptr` pointer
  // previously given to `wasmtime_func_new_with_env`.

  // But how can we retrieve more host context values that
  // were created *after* `wasmtime_func_new_with_env`?
}

The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.

Thoughts

There were two solutions that I could think of:

  1. Stop using the Linker, create the instance via wasmtime_instance_new() and bind imports with wasmtime_func_new_with_env() right after. This allows for some host context pointers to be given to the callback void *env argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_
  2. If so, then sticking to the Linker sorts of makes wasmtime_func_new_with_env less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like: wasmtime_instance_env_set(wasm_instance_t *instance, void *env). This pointer would then be given to host function callbacks, in one of their arguments.

What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!

Thanks in advance!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 06:01):

thibaultcha edited Issue #1911:

Hello!

Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.

The use-case at hand is: in my host functions callbacks (wasmtime_func_callback_with_env_t), I'd like to retrieve some context values from my (event-driven) host application.

Example

Below are some extracts from my embedding, which is using the Linker:

  1. With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, &my_host_func, ptr, NULL);
wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
  1. Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
  1. Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) {
  // Here, we can retrieve the instance and the `ptr` pointer
  // previously given to `wasmtime_func_new_with_env`.

  // But how can we retrieve more host context values that
  // were created *after* `wasmtime_func_new_with_env`?
}

The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.

Thoughts

There were two solutions that I could think of:

  1. Stop using the Linker, create the instance via wasmtime_instance_new() and bind imports with wasmtime_func_new_with_env() right after. This allows for some host context pointers to be given to the callback void *env argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_
  2. If so, then sticking to the Linker sorts of makes wasmtime_func_new_with_env less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like: wasmtime_instance_env_set(wasm_instance_t *instance, void *env). This pointer would then be given to host function callbacks, in one of their arguments.

What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!

Thanks in advance!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 13:44):

alexcrichton commented on Issue #1911:

For the first solution you have, I'm not sure if that will be possible since the instance's imports must exist before the instance is created, so you can't bind imports after instantiation. Also FWIW using Linker vs raw instances shouldn't have an impact on performance.

In general though it sounds like this is where the env argument would be used? You could, for example, allocate some space which is filled in after instantiation, and pass that as the env. That way when the function is called it has access to the data stuffed into env after initialization. You'd just need to be careful that if the function is called during initialization is reports an error of some form. (this is a fundamental wasm limitation, and is why the start function isn't the most useful)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 16:19):

thibaultcha commented on Issue #1911:

@alexcrichton Hi there,

For the first solution you have, I'm not sure if that will be possible since the instance's imports must exist before the instance is created, so you can't bind imports after instantiation.

Yes, in this case, the Instance is created within the host context, and I can pass the necessary pointers to wasmtime_func_new_with_env().

Also FWIW using Linker vs raw instances shouldn't have an impact on performance.

Even assuming that for each new Instance, we loop over its imports and call wasmtime_func_new_with_env() for each one of them?

You could, for example, allocate some space which is filled in after instantiation, and pass that as the env. That way when the function is called it has access to the data stuffed into env after initialization.

I did not include this solution in my above reasoning because it significantly complicates an embedding's design. Doing so would mean that env needs to be a global R/B Tree which is maintained by the host application in which it has to stuff context values for each created Instance. Besides the extra cost associated with that, the host application now has to deal with maintaining global states for each Instance, adding associated state when the Instance is created, and removing it when it is deleted. This seems like a breaking of encapsulation to me and a rather poor design for an embedding. Instead of having to do all of this extra work, a host application could simply associate some arbitrary data with a given Instance right after having created it. This is _significantly_ simpler and leaner (a single line of code for associating data vs. having to maintain associated global structures on the side).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 18:09):

alexcrichton commented on Issue #1911:

FWIW there is a wasm_instance_set_host_info API to set arbitrary data on the wasm_instance_t, but you don't have access to this as part of wasm_func_call. Additionally I'm not sure what you mean about looping over imports and calling wasmtime_func_new_with_env for them? It seems like you do the same for raw instantiation vs linker-based instantiation?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 19:20):

thibaultcha commented on Issue #1911:

Thank you for continuing this discussion!

With Linker-based instantiation, I call wasmtime_func_new_with_env() _once_ for each host function during the initialization of my host program. Then, during runtime, I call wasmtime_linker_instantiate() once for each execution context in which I will need an Instance.

With raw instantiation, it seems like I would have to call wasmtime_func_new_with_env() for each host function for each Instance that I create.

This is the source of my confusion as to whether Linker-based instantiation may be cheaper than raw instantiation. If the cost is similar, I can rely on the env argument of wasmtime_func_new_with_env() to keep track of host context values. If raw instantiation is significantly more expensive (computation and memory wise), then I would need another way to bound host context values to a given Instance.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 19:40):

thibaultcha commented on Issue #1911:

Digging into the Linker's code, I seem to notice that the associated cost would be somewhat identical. The difference seems to be whether the imports are resolved in the Linker's Rust code or in the host application's C code.

Because using the Linker could greatly reduce efforts from embedder, do you think that there is a need for host context values to be given to wasmtime_linker_instantiate()? Such given values could then be passed to the functions created by the Linker, and retrievable in the wasmtime_func_callback_with_env_t callbacks in their void *env argument maybe?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 20:35):

alexcrichton commented on Issue #1911:

Er sorry I'm still not really sure what this issue is about. It's not clear to me what contextual information you're attaching where, and why the env pointer and/or the set_host_info business aren't the right solution. I'm a bit confused by the discussion of performance, too, so perhaps that could be set aside for a bit to figure out the env business?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:19):

thibaultcha commented on Issue #1911:

I sure can elaborate and be more specific!

I am embedding wasmtime inside of the Nginx web server. Upon processing of an HTTP request, Nginx creates a ngx_http_request_t structure. What I want to do is to create a new instance for each such ngx_http_request_t structure. The instance will then be called at various points during the processing of this HTTP request, until it is freed (at which point the instance will be freed as well).

Now, when the instance associated to a request invokes an imported host function, I need to retrieve this ngx_http_request_t structure to know on which request the callback's logic should apply. Therefore, I need a way to bound an instance to a specific request context data in order to do this:

wasm_trap_t*
my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) {
    ngx_http_request_t *r = (ngx_http_request_t *) env;

    /* ... */
}

Now, because I am using the Linker in order to not have to implement my own imports resolver (which I previously did but would like to avoid if possible), I cannot associate this ngx_http_request_t pointer to a given instance created via wasmtime_linker_instantiate() (unless I am missing something else).

This issue is trying to solve this problem: how to retrieve host context values from a callback when instances are created via the Linker.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:22):

thibaultcha edited a comment on Issue #1911:

I sure can elaborate and be more specific!

I am embedding wasmtime inside of the Nginx web server. Upon processing of an HTTP request, Nginx creates a ngx_http_request_t structure. What I want to do is to create a new instance for each such ngx_http_request_t structure. The instance will then be called at various points during the processing of this HTTP request, until it is freed (at which point the instance will be freed as well).

Now, when the instance associated to a request invokes an imported host function, I need to retrieve this ngx_http_request_t structure to know on which request the callback's logic should apply. Therefore, I need a way to bound an instance to a specific request context data in order to do this:

wasm_trap_t*
my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[])
{
    ngx_http_request_t *r = (ngx_http_request_t *) env;

    /* ... */
}

Now, because I am using the Linker in order to not have to implement my own imports resolver (which I previously did but would like to avoid if possible), I cannot associate this ngx_http_request_t pointer to a given instance created via wasmtime_linker_instantiate() (unless I am missing something else).

This issue is trying to solve this problem: how to retrieve host context values from a callback when instances are created via the Linker.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:28):

alexcrichton commented on Issue #1911:

What you'll likely want to do in that case is:

The last step means you'll have to recreate functions for every request anyway, so you'll be able to pair the env parameter at that time.

Does that make sense? It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:48):

thibaultcha commented on Issue #1911:

It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same wasm_store_t (maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).

So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker, itself tied to the request's store, and eventually creating the instance for me. Either way allows for pairing the env argument when creating the functions.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:50):

thibaultcha edited a comment on Issue #1911:

It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same wasm_store_t (maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).

So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) and eventually creating the instance for me. Either way allows for pairing the env argument when creating the functions at that time.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:51):

thibaultcha edited a comment on Issue #1911:

It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same wasm_store_t (maybe misunderstanding the underlying performance implications, or memory sharing between different requests, e.g. eventually allowing for sharing globals between them, etc...).

So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the env argument when creating the functions at that time.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:57):

thibaultcha edited a comment on Issue #1911:

It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

Hmm right, that's what I was doing since I was hoping that all instance could be backed by the same wasm_store_t (maybe misunderstanding the underlying performance implications? I was hoping to save the extra allocation costs).

So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the env argument when creating the functions at that time.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 23 2020 at 21:59):

thibaultcha edited a comment on Issue #1911:

It sounds like you're caching the wasm_linker_t globally which may be causing my confusion.

Hmm right, that's what I was doing since I was hoping that all instances could be backed by the same wasm_store_t (maybe misunderstanding the underlying performance implications? I was hoping to save the extra allocation costs).

So, what I seem to be understanding is that with your proposed implementation, instantiating the module within the request's store can be performed via raw instantiation (and binding the imports myself), or via instantiating a new Linker (itself tied to the request's store) which would eventually create the instance for me. Either way allows for pairing the env argument when creating the functions at that time.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 24 2020 at 14:51):

alexcrichton commented on Issue #1911:

Yeah that's what I'm thinking, where if you instantiate per-request then you can create imports paired with env arguments.

For wasm_store_t the issue isn't so much about reusing allocations but rather ever freeing them. No wasm_instance_t is fully deallocated until the entirety of its store and all other references to the store have gone away. This is done because we don't have a full GC. If you're a long-runner server (e.g. nginx) you probably want to keep memory usage under control, so you'll likely want to have a store-per-request so when the request is finished you'll be able to free all memory associated with the request.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 07:07):

thibaultcha commented on Issue #1911:

Thank you for the wasm_store_t clarifications! Alright then, this discussion answers the questions that I had on retrieving host context values and a clarifies a few more points; I'll go ahead and close this issue now.

Thank you for your time @alexcrichton, much appreciated.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 07:07):

thibaultcha closed Issue #1911:

Hello!

Lately, I have been pondering about something stumbling me in the C API. From my understanding of it, I seem to either be facing some limitation in the C API or to be having some misconceptions about what the most efficient way of using it would be. I am also not sure if this discussion belongs here or ultimately to the https://github.com/WebAssembly/wasm-c-api repository, but I thought that here would be a better starting point.

The use-case at hand is: in my host functions callbacks (wasmtime_func_callback_with_env_t), I'd like to retrieve some context values from my (event-driven) host application.

Example

Below are some extracts from my embedding, which is using the Linker:

  1. With said Linker, we can define host functions for future imports:
func = wasmtime_func_new_with_env(store, functype, &my_host_func, ptr, NULL);
wasmtime_linker_define(linker, module_name, func_name, wasm_func_as_extern(func));
  1. Later on, let's create an instance of this module that will be bound to a given event source:
wasmtime_linker_instantiate(linker, module, &instance, &trap);
  1. Assuming our instance eventually gets invoked and the executed wasm code calls our above import, we end up in our host callback:
wasm_trap_t *my_host_func(const wasmtime_caller_t *caller, void *env, const wasm_val_t args[], wasm_val_t results[]) {
  // Here, we can retrieve the instance and the `ptr` pointer
  // previously given to `wasmtime_func_new_with_env`.

  // But how can we retrieve more host context values that
  // were created *after* `wasmtime_func_new_with_env`?
}

The lack of ability to assign some context data to an Instance seems to make it tricky to efficiently embed inside of an event-driven application.

Thoughts

There were two solutions that I could think of:

  1. Stop using the Linker, create the instance via wasmtime_instance_new() and bind imports with wasmtime_func_new_with_env() right after. This allows for some host context pointers to be given to the callback void *env argument, which solves our above issue, but raises questions as to what the performance trade-offs of it could be. _Am I mistaken in thinking that using this approach could be more expansive than using the above one with the Linker?_
  2. If so, then sticking to the Linker sorts of makes wasmtime_func_new_with_env less useful (since it can pass references to data created at that time only). Yet, we could maybe store some data alongside an Instance for this purpose via some getter/setter API, something like: wasmtime_instance_env_set(wasm_instance_t *instance, void *env). This pointer would then be given to host function callbacks, in one of their arguments.

What I wonder is: am I onto an actual need for the C API, onto something that's already solved, or am I misunderstanding something bigger? Either way, I'd very much appreciate hearing some thoughts on it!

Thanks in advance!


Last updated: Dec 23 2024 at 13:07 UTC