Hello, I am seeking advice with regards to WASI usability and the wasmtime linker interface (C API).
I am left with no choice but to use the C linker interface in order to use WASI (even though I would very, very, much rather avoid it), but it is very unclear to me how it should be used (the other choice which we are considering is to drop support for Wasmtime entirely and go with other runtimes that commit to supporting WASI with the wasm-c-api). I am aware of https://github.com/bytecodealliance/wasmtime/issues/2974.
In this project (Nginx Wasm embedding), with other runtimes (with which it is possible to use the wasm-c-api only, even with WASI), each created instance is tied to a connection in Nginx. Each instance gets a number of env
(host) function imports, all created via wasm_func_new_with_env(store, type, cb, ctx)
, where ctx
is a pointer to said connection which can thus be retrieved from within the host function.
Now with the _Wasmtime_ C API _and_ the requirement to enable WASI, I don't really see a way to achieve an equivalent to this except if each instance gets a different linker, and each imported host function is defined with wasmtime_linker_define_func
where data
is the aforementioned ctx
. Is this acceptable/by design? I am under the impression that wasmtime_context_set_data
was designed for this purpose, unfortunately I don't see an equivalent in the wasm-c-api for other runtimes to store some context alongside a wasm_store_t
, which still leaves us in an incompatible state with regards to compatibility with other runtimes - unless I am missing something?
For example, as an alternative to the linker interface, Wasmer defines wasi_get_imports and this works well enough, we handle the linking ourselves.
Please advise, thank you!
@Thibault Charbonnier would you be up for discussing some of the requirements of your embedding? Depending those it'll probably help guide the best solution here. For example it sounds like a requirement is you need to use a built-in implementation of WASI rather than a custom nginx-specific implementation? Alternatively, though, do you have performance requirements around instantiation and/or invoking wasm functions?
For some history when we redesigned Wasmtime's C API we decided to provide compatibility with wasm.h
as a shim rather than designing the C API around that header. This was done for a number of reasons including maintainability (wasm.h
hasn't changed in quite some time and doesn't support features like reference types, simd, or module linking) and speed (the singular method of instantiation in wasm.h
we've found is difficult to optimize further given the API requirements). This means that most features of Wasmtime's C API, including the wasmtime_linker_t
type, are geared towards the Wasmtime-specific support primarily and we've added compatibility with wasm.h
afterwards. We could provide a custom linker type for the wasm.h
types but we decided to not do so initially and see how things panned out in the future.
In any case it might be useful to discuss some more the properties/requirements of your embedding. Depending on that the design of wasmtime.h
may make sense, or the conclusion may be that the best thing to do would be to add more Wasmtime-specific enhancements to wasm.h
. I'd be happy to help out with the latter but we unfortunately don't have the bandwidth to implement the latter at this time.
Hi @Alex Crichton, thank you for reaching out! Yes, I can certainly elaborate. The only requirement is to use the wasm-c-api only so as to give users the choice of which WASM runtime to use. So far, we have a WASM "VM" component within Nginx which is written with wasm-c-api only and allows us to load modules, do the host linking, invoke our WASM functions anywhere within Nginx, etc... It works with Wasmtime and Wasmer already. We thus use this VM to implement the proxy-wasm SDK inside of Nginx (the same SDK used by Envoy for its WASM filters). For this reason, the wasm-c-api is more than enough for our needs, we do not need any of the newer WASM features, and likely won't for quite some time.
Things started getting complicated when I tried adding support for Golang with proxy-wasm-go-sdk; this TinyGo implementation of proxy-wasm requires WASI imports. When I wrote the aforementioned VM with wasm-c-api, this used to not be an issue since both Wasmer and Wasmtime supported WASI with the wasm-c-api, however https://github.com/bytecodealliance/wasmtime/issues/2974 happened in the meantime and now the approach we have taken for almost a year is broken with Wasmtime because of this decision. So I find myself having to rewrite most of the VM for Wasmtime specifically, and adding a consequential amount of undesired complexity solely for the purpose of maintaining Wasmtime compatibility.
However like I was pointing out in my above topic, I am even finding out that this might still not be a valid approach after all, unless I instantiate and use a Linker for each and every instance (i.e. connection) within Nginx; this does not seem ideal at all so I am wondering what I am missing. When I look at wasmtime_context_set_data
as an alternative to setting env context for retrieval in host functions, that alternative does not seem to exist for wasm-c-api runtimes. I am having a hard-time maintaining compatibility of runtimes with or without wasm-c-api/Linkers.
This feels like a dead-end, unless Wasmtime reverts support for WASI imports with wasm-c-api; what do you think?
Thank you again for lending an ear!
this is only lightly touched on in the RFC about the new API, but a major reason we're not focusing on the wasm-c-api approach is that there are some pretty fundamental sources of overhead. In particular in your use case, it seems like performance is really crucial, so I'd be careful about making the "must use wasm-c-api" approach too hard a requirement.
Another issue is that the wasm-c-api leaves a huge amount of semantics unspecified, meaning that there are lots of situations where runtimes will do pretty fundamentally different things anyway, so there's a false sense of compatibility, and you really can't treat the API as an abstraction for drop-in replacements.
Given all this, I'd strongly counsel to create higher-level abstractions and use the best way to embed each relevant runtime. I'd be pretty surprised if that didn't save you work in the long (or even medium) run in addition to the perf benefits
One thing I would add to what Till said is it sounds like you are having trouble implementing your embedding with the current as-of-today wasmtime.h
APIs? If so I'd be more than happy to help out there as well to figure out how to wrangle them.
Hi Till! Yes Alex, correct; even with the Wasmtime API (wasmtime.h), which is already a significant source overhaul, I still cannot figure out how to use it so as to do the equivalent of what we do with wasm-c-api.
Ah ok, would you be up for digging more into that? (e.g. outlining a bit more what you're stuck on)
The general gist of the data pointers is that there's two-levels of custom data pointers, one is at the store/context level where you get a custom void*
, and the second is at the per-host-function level where you get a void*
for each host function (using wasmtime_func_new
)
Sure; it all has to do with storing a context structure (keeping pointers to the current connection, the current proxy-wasm execution context as well, etc...) so as to retrieve them from the host callback (like wasmtime_func_callback_t
's env
argument - first argument)
for that what I'd recommend is to place everything into the store/context pointer
that way you can create one wasmtime_linker_t
for the entire lifetime of the program
and it's much faster to instantiate since the linker is already created for all connections
Right; like we do with wasm-c-api, I use wasm_func_new_with_env
. But I cannot find the equivalent with Wasmtime's Linker API?
and then you'll create a store-per-connection where the custom data is stored in the store's context pointer
You'll instead want to use wasmtime_linker_define_func
So that's the issue, I wish to maintain flexibility here, and while a store per connection is possible, the current approach also supports a global store.
I wouldn't recommend a global store because it means that instances are never free'd
instances are only deallocated once their containing store is deallocated
Right
it also would means that instances could accidentally leak data between each other, breaking some isolation
but if you still want to a global store is still possible (although you'll have to synchronize access to it since it's not threadsafe)
a linker can be a one-per-program thing but if you really want it can also be a per-store thing
can't you also do multiple stores, but still reuse them across connections?
Perhaps. Now if I use multiple stores, don't I need to load modules once for each store?
Modules exist at the engine-level so you only need to compile a module once, and then it can be instantiated within separate stores
Using multiple stores should be an easy change since I was planning on supporting it too, I will try that out and try to make it work with Wasmer's wasm-c-api too. Thanks!
Last updated: Dec 23 2024 at 13:07 UTC