rvolosatovs opened PR #39 from rvolosatovs:wasmtime-plugins
to bytecodealliance:main
:
Refs https://github.com/bytecodealliance/wasmtime/issues/7348
programmerjake commented on PR #39:
i don't see a way to access a .so global variable from wasm...
rvolosatovs commented on PR #39:
i don't see a way to access a .so global variable from wasm...
indeed, I did not add this functionality in the current
wasi-dl
draft, it's really just an approximation sufficient for a very basic PoCDoes https://github.com/rvolosatovs/wasi-dl/pull/1 address your concern?
alloc
can be interpreted as anyffi-type
or even a function.Perhaps
resource symbol
is redundant altogether and lookups should just returnalloc
resources, which can be interpreted as functions
rvolosatovs edited a comment on PR #39:
i don't see a way to access a .so global variable from wasm...
indeed, I did not add this functionality in the current
wasi-dl
draft, it's really just an approximation sufficient for a very basic PoCDoes https://github.com/rvolosatovs/wasi-dl/pull/1 address your concern?
alloc
can be interpreted as anyffi-type
or even a function.Perhaps
resource symbol
is redundant altogether and lookups should just returnalloc
resources, which can be interpreted as functionsEdit: added https://github.com/rvolosatovs/wasi-dl/pull/1/commits/30ea77f0d5679f8382603890408c6fef0bc4713b
rvolosatovs updated PR #39.
rvolosatovs edited a comment on PR #39:
i don't see a way to access a .so global variable from wasm...
indeed, I did not add this functionality in the current
wasi-dl
draft, it's really just an approximation sufficient for a very basic PoCDoes https://github.com/rvolosatovs/wasi-dl/pull/1 address your concern?
alloc
can be interpreted as anyffi-type
or even a function.Perhaps
resource symbol
is redundant altogether and lookups should just returnalloc
resources, which can be interpreted as functionsEdit: added https://github.com/rvolosatovs/wasi-dl/pull/1/commits/30ea77f0d5679f8382603890408c6fef0bc4713b
Edit 2: updated PR with https://github.com/bytecodealliance/rfcs/pull/39/commits/b48bdaa743642403ab3ac771495bd882f571d88b
programmerjake commented on PR #39:
Does rvolosatovs/wasi-dl#1 address your concern?
yes!
rvolosatovs updated PR #39.
rvolosatovs edited PR #39:
Refs https://github.com/bytecodealliance/wasmtime/issues/7348
rvolosatovs commented on PR #39:
Did a small update to ensure the symbol lookups are typed https://github.com/bytecodealliance/rfcs/pull/39/commits/b9ae4c912db5d96956c1b57a2e74696a55db6b62
sunfishcode commented on PR #39:
Such interface is unsafe and it must be used with extreme care, however that is no different from any other host plugin, which would be loaded via
dlopen
.There are two kinds of unsafe relevant here. One is whether the plugin code is unsafe, and I agree that this is basically the same with any host plugin system we'd design here. The other is whether Wasm code using the plugin code is unsafe.
The libffi-style approach in this proposal looks like it means that we'd additionally have to treat the Wasm that calls the code as unsafe by default, and while there are potential ways to make it safe, they aren't described here.
Also, the libffi-style approach in this proposal looks like it would mean that the Wasm would not be portable, in general, because libffi doesn't encapsulate all C ABI details. What is
off_t
a typedef for? What is the value ofENOENT
? And so on.[wasi-dl]-based approach also provides greater security, since the implementation of [wasi-dl] may restrict the set of libraries allowed to be loaded and potentially define the exact signatures for symbols defined in them.
This proposal does not currently describe how this would work. And, signatures alone would not be sufficient, because libffi-style bindings also include raw pointers.
Perhaps it would be possible to design an interface description language sophisticated enough to describe these interfaces, including signatures, lifetime information, synchronization information, and perhaps also resource lifetimes (eg. open files that need to be explicitly closed and not used thereafter), and perhaps eventually even a way to describe C
union
s, or some safe variant-like subset of them. If this rfc implies the design of a new interface description language, it'd be good to say more about what that looks like.As an alternative to exposing wasi-dl interface to (plugin) components, we could use the dynamic libraries themselves as the host plugins. For that we would need to carefully design a set of conventions specific to wasmtime for such plugins to be able to define their exports and expose them to components.
Such an approach would require custom-built dynamic libraries for plugins, if an existing library was desired to be used, an "adapter" library would need to be built, which would in turn dynamically-load that library.
It looks like this proposal would also usually want "adapter" libraries too, or at least adapter layers, because I don't expect we'll want normal Wasm code talking directly to these low-level libffi-style APIs, for ergonomics, language-independence, portability, and potential security reasons. And these adapters are going to be tedious to write and maintain, because they need to be written for each source language that needs them, and they'll have a lot of repetitive low-level code. I imagine we'd pretty quickly find ourselves wanting bindings generators for this task.
And if we're going to design a language-independent sandboxable interface description language with tooling around it for generating bindings, we should think carefully about whether or not we already have one, and what relationships we want :smile:.
fitzgen submitted PR review:
Thanks for writing up this RFC!
I agree that a plugin system geared towards allowing hosts to define and expose new capabilities to Wasm guests that Wasmtime has no builtin knowledge of is very valuable.
Unfortunately, I think a missing constraint is that we fundamentally cannot trust Wasm guests, so we can't just expose
dlopen
/dlsym
and raw FFI types to them. Therefore, I don't think the solution proposed here is something we can pursue. More details inline below.That said, I also sketch (very roughly) an alternative approach that should address the same motivations but which avoids giving untrusted Wasm guests raw
dlopen
powers.
fitzgen created PR review comment:
I guess the answer to the previous question is "yes" then.
I see @sunfishcode's comments now, and I agree with the gist of his points.
There is a difference between whether
- the plugin internally is using
unsafe
but exposing a safe interface, and- the plugin's interface is itself
unsafe
.With (1) the (untrusted and potentially malicious) Wasm guest cannot trigger any memory safety, modulo implementation bugs in the plugin itself.
With (2) the (untrusted and potentially malicious) Wasm guest can trivially trigger memory unsafety. That is, (2) is handing security vulnerabilities to Wasm guests by design.
So (2) is a complete non-starter; it is contradictory to Wasmtime's (and the BA's) mission and values.
And -- correct me if I'm wrong! -- this RFC seems to be proposing (2) so, unless I am misunderstanding the proposal, this is not an approach we should consider or pursue any further.
fitzgen created PR review comment:
To be more constructive, I would suggest an alternative approach that maintains a safe interface to Wasm, something like:
- There is some well-known symbol that plugin
.so
s should export, describing their WIT interface (maybe literally just astatic WIT_INTERFACE: &'static str = "..."
or alternatively the binary encoding of the same thing).- Wasmtime loads a
plugin.so
and reads its WIT interface- Wasmtime
dlsym
s the functions described by the WIT interface- Wasmtime adds functions for that WIT interface to a
Linker
, these functions
- translates Wasm / canonical ABI arguments into the equivalent in some sort of native ABI
- call their corresponding
dlsym
ed functions fromplugin.so
- translate the native ABI's result back into Wasm / canonical ABI
In the above sketch, the
plugin.so
is trusted, but the Wasm is not. Any unsafety can only come from bugs in theplugin.so
(either from its internal implementation or if its functions' types don't match the WIT interface it claims). Notably, unsafety cannot originate from within (untrusted and potentially malicious) Wasm guests, no matter what garbage values they indirectly pass toplugin.so
.The tricky parts here will be:
- What is the native ABI? Can we reuse the canonical ABI or a variant of it? I could imagine a
bindgen
-y proc macro that does some variant of the canonical ABI for plugins with statically-known interfaces, but what about dynamic interfaces (i.e. the common case for thewasmtime
cli, rather than awasmtime
crate embedding that happens to use plugins of a certain shape)? What can we do to avoid arg/result translation overheads?- A
plugin.so
may want some per-Store
state, for example ifwasi-sockets
was implemented as a plugin, it would want any open sockets to be attached to theStore
. How do we letplugin.so
create that per-Store
state? Where do we keep it? How do we pass it back toplugin.so
on each call? How do we letplugin.so
destroy it when we drop the store?- Finally, it isn't clear to me whether this RFC proposes that
plugin.so
s are forwards compatible with newwasmtime
versions (i.e. new Wasmtime releases are backwards compatible with oldplugin.so
s) or not. If so, then the ABI concerns described above are doubly important and we need to make sure they remain extensible for future additions and changes, which will involve a lot of subtleties.
fitzgen created PR review comment:
Would this result in memory unsafety if the Wasm (which is untrusted, and potentially malicious) passes the wrong number or type of arguments and returns?
Or is it expected that Wasmtime will somehow dynamically check these calls?
Similar question for declaring FFI struct types and their fields.
rvolosatovs commented on PR #39:
Thanks for the feedback @sunfishcode @fitzgen!
In general I feel that perhaps I misjudged the expected level of detail for RFCs in this repository, this RFC currently is very much a high-level idea/direction, as opposed to directly-implementable design document, which seems to what people are searching for here.
First, let's agree on some terms:
In this RFC by component composition I mostly refer to function-style composition, and not component composition as defined at https://component-model.bytecodealliance.org/creating-and-consuming/composing.html#what-is-composition
For example,wasi-virt
can mostly fulfill the composition as would be required here.More formally, let's assume that components are morphisms (functors) that map a set of interfaces (imports) to another set of interfaces (exports).
Their composition is depicted here: ![composition](https://upload.wikimedia.org/wikipedia/commons/e/ef/Commutative_diagram_for_morphism.svg), taken directly from Category theory Wikipedia page.
Here's an example in context of this RFC:
// Trusted Wasm targets this world world plugin { // These two interfaces are provided by the host: import wasi:sockets/tcp; import wasi:dl/dl; // These two interfaces are provided to the guest: export wasi:sockets/tcp; export wasi:keyvalue/store; } // Untrusted Wasm targets this world world guest { // These two interfaces are either directly provided by the plugin component or passed through to the host *staticaly* by the composition tool: import wasi:sockets/tcp; import wasi:keyvalue/store; export wasi:http/incoming-handler; // NOTE: This import would *not* be satisfied: // import wasi:dl/dl; } world composed { import wasi:sockets/tcp; import wasi:dl/dl; export wasi:http/incoming-handler; }
@fitzgen you seem to imply that all Wasm is implicitly untrusted.
I'm not sure I agree with that statement and the assumption I'm operating upon is that whether a trusted piece of code is compiled into a native application/library or a Wasm component should not change the "trustworthiness" of the produced artifact. That's a key assumption on which this RFC is built.
Is there something specific about Wasm components I'm not aware of, that would make them inherently untrusted?
In https://github.com/bytecodealliance/rfcs/pull/39#discussion_r1825062649 you've outlined a way how a plugin could be loaded by Wasmtime:
There is some well-known symbol that plugin
.so
s should export, describing their WIT interface (maybe literally just astatic WIT_INTERFACE: &'static str = "..."
or alternatively the binary encoding of the same thing).Wasmtime loads a
plugin.so
and reads its WIT interfaceWasmtime
dlsym
s the functions described by the WIT interfaceWasmtime adds functions for that WIT interface to a
Linker
, these functions
- translates Wasm / canonical ABI arguments into the equivalent in some sort of native ABI
- call their corresponding
dlsym
ed functions fromplugin.so
- translate the native ABI's result back into Wasm / canonical ABI
Note, that adding functions to the
Linker
usingdlopen
and instantiating the (untrusted) Wasm component using it produces a runtime object, which is effectively the composition as I defined above, except it happens at runtime.In the context of this RFC, the plugin could operate exactly like you've outlined in https://github.com/bytecodealliance/rfcs/pull/39#discussion_r1825062649, except wasmtime CLI would load
plugin.wasm
as opposed toplugin.so
.Let's consider an example with a shared library plugin (this is not an API suggestion, just a quick example sketch):
wasmtime serve --plugin plugin.so untrusted.wasm
plugin.so
would be operating directly as part of runtime's process with no sandboxing whatsoever, it has full, unconstrained access to the OS and runtime process memory.An example usage with a Wasm component plugin could look like this:
wasmtime serve --plugin plugin.wasm -P tcp=y -P dl=y untrusted.wasm
plugin.wasm
is operating in different "trust mode", fromuntrusted.wasm
, but still sandboxed.The CLI user explicitly allows
plugin.wasm
to usetcp
andwasi-dl
, it has no access to anything else.
-wasi-dl
access could be scoped, e.g. (again, just a quick sketch):
wasmtime serve --plugin plugin.wasm -P tcp=y -P dl=libm.so:libm.h -P dl=sqlite3.so:sqlite3.h untrusted.wasm
With an API like this the plugin could only ever load
libm.so
orsqlite3.so
- the associated header files could be used to verifywasi-dl
calls and would, given the shared library and associated header file correctness, guarantee memory safety.
Note, that loading C headers is probably a lot of work and I'm not suggesting doing that, rather just pointing out that there is a way to make such interface safe.
untrusted.wasm
does not inheritplugin.wasm
imports -untrusted.wasm
in this scenario only has access to interfaces exported and implemented byplugin.wasm
, nothing else.In both cases, one way or another,
wasmtime
would need to produce a "runtime composition" of a plugin and the guest component.Arguably, the Wasm plugin option is safer, since the runtime can control what libraries, symbols and their signatures can the plugin access.
In this RFC I've decided to start with a simple approach and give the
wasmtime
CLI user more control and produce such composition ahead-of-time, drasticaly reducing the scope for this feature and improving performance.If (trusted)
plugin.wasm
(with optionalwasi-dl
access) was run in a separate sandbox, would that address your concerns @fitzgen?There are two kinds of unsafe relevant here. One is whether the plugin code is unsafe, and I agree that this is basically the same with any host plugin system we'd design here. The other is whether Wasm code _using_ the plugin code is unsafe.
The libffi-style approach in this proposal looks like it means that we'd additionally have to treat the Wasm that calls the code as unsafe by default, and while there are potential ways to make it safe, they aren't described here.
From perspective of memory safety purely, if
wasmtime
loadedplugin.so
, which exportedwasi:keyvalue/store
, eachwasi:keyvalue/store
interface call in the guest would be unsafe.Whether we trust the plugin code or not, guest code directly or indirectly invoking a symbol loaded from a shared object will always be potentially memory unsafe.
Like I mentioned above, the runtime could limit
wasi-dl
access and potentially made aware of the symbols exported by the libraries (or even statically link to libraries at compilation time and expose them viawasi-dl
abstraction).Effectively,
wasi-dl
could be turned into a shared object introspection interface, which would be type-safe and could even verify contract constraints not directly expressable by C type system.One potential strategy could be using value definitions or just functions (since recursive types are not currently allowed) to either process a C header file ahead-of-time or somehow else (e.g. manually) produce something roughly similar to:
(component (import "wasi:dl/ffi" (instance (export "primitive-type" (type $primitive_type (enum "c-char" "uint64-t" ;; etc.. ))) ;; etc... )) (import "wasi:dl/dll" (instance (export "function" (type $function (sub resource))) ;; etc... )) ;; using value definition (export "SOMECONST" (value $primitive_type (enum "uint64-t")) ;; using a function, returns the C type of the constant (export "SOMECONST" (func (result $primitive_type))) ;; returns a typed `wasi:dl/dll.function` (export "myfunc" (func (result $function))) )
What is the native ABI? Can we reuse the canonical ABI or a variant of it? I could imagine a
bindgen
-y proc macro that does some variant of the canonical ABI for plugins with statically-known interfaces, but what about dynamic interfaces (i.e. the common case for thewasmtime
cli, rather than awasmtime
crate embedding that happens to use plugins of a certain shape)? What can we do to avoid arg/result translation overheads?A
plugin.so
may want some per-Store
state, for example ifwasi-sockets
was implemented as a plugin, it would want any open sockets to be attached to theStore
. How do we letplugin.so
create that per-Store
state? Where do we keep it? How do we pass it back toplugin.so
on each call? How do we letplugin.so
destroy it when we drop the store?Finally, it isn't clear to me whether this RFC proposes that
plugin.so
s are forwards compatible with newwasmtime
versions (i.e. new Wasmtime releases are backwards compatible with oldplugin.so
s) or not. If so, then the ABI concerns described above are doubly important and we need to make sure they remain extensible for future additions and changes, which will involve a lot of subtleties.If the (trusted) plugin was a Wasm component, there'd be no need for any custom symbols or ABI - answers to most of these questions would be provided directly by the component model.
Perhaps it would be possible to design an interface description language sophisticated enough to describe these interfaces, including signatures, lifetime information
[message truncated]
I think having adapter.so directly provide the wasm component interface rather than having to use an intermediate plugin.wasm is safer, faster and easier to use for the end user.
Plugin.wasm is effectively unsandboxed as any mistake in it's use of wasi-dl would cause UB. It is a lot easier to directly define a safe wasm component interface in adapter.so than to export an unsafe C api and then separately consume this C api in plugin.wasm and hope that you didn't accidentally cause an ABI mismatch (as soon as you use any non-fixed size integer type (or an integer type larger than the register size) or you use a struct type or enum in your C api, it becomes non-trivial to match the ABI unless you are the C compiler that compiled adapter.so. And if adapter.so is written in Rust, avoiding a separate plugin.wasm may enable the plugin writer to entirely avoid unsafe code.
Having the intermediate plugin.wasm also requires you to copy all data twice. Once from adapter.so to plugin.wasm and once from plugin.wasm to the wasm module that uses the plugin. If adapter.so directly provides a wasm component interface, it only needs to be copied once.
And finally it is easier for the end user if only adapter.so exists. This way there can't be a version mismatch between adapter.so and plugin.wasm (which will likely cause UB) and you only need to copy a single file around to use the plugin.
rvolosatovs commented on PR #39:
Plugin.wasm is effectively unsandboxed as any mistake in it's use of wasi-dl would cause UB. It is a lot easier to directly define a safe wasm component interface in adapter.so than to export an unsafe C api and then separately consume this C api in plugin.wasm and hope that you didn't accidentally cause an ABI mismatch (as soon as you use any non-fixed size integer type (or an integer type larger than the register size) or you use a struct type or enum in your C api, it becomes non-trivial to match the ABI unless you are the C compiler that compiled adapter.so
I've outlined an example approach in https://github.com/bytecodealliance/rfcs/pull/39#issuecomment-2451778905, which would let prevent UB in using
wasi-dl
.
All primitive types are read and written via resource methods inwasi-dl
https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L68-L108, so a size mismatch is not possible for non-fixed size integers - the runtime knows the sizes of C primitives at compile time, and if the component would try to write 16-bitchar
using au32
, it would get an error fromset-u32
The component can query the primitive sizes at runtime usingsizeof
: https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L124Structs are also fully supported by
libffi
: https://www.chiark.greenend.org.uk/doc/libffi-dev/html/Size-and-Alignment.html
Components would read and write them using resources https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L88-L117, where the runtime takes care of the alignment and size.Having the intermediate plugin.wasm also requires you to copy all data twice. Once from adapter.so to plugin.wasm and once from plugin.wasm to the wasm module that uses the plugin. If adapter.so directly provides a wasm component interface, it only needs to be copied once.
A
plugin.wasm
would directly read data through the pointer from the shared object. That data would then need to be copied (assuming shared refs are not allowed) into the guest component's memory.With a
plugin.so
, it depends:
If we wantplugin.so
to directly write into runtime's memory, the only single-copy approach I see is the following:
- Runtime gives a pointer into component's memory space to the plugin
- Plugin writes through the pointer
Otherwise, we'd still need two copies
rvolosatovs closed without merge PR #39.
rvolosatovs commented on PR #39:
Writing an RFC for
dlopen
-based plugins was never my intention, I've originally been working on an RFC for RPC-based plugins only, but after gathering some internal feedback and building a small PoC, decided to pivot to try and produce a unified plugin interface (the Wasm-component based one), which would cover all use cases. I personally do not have a use case for thedlopen
-based plugins - RPC-based plugins being the only use case I'm after. Basing RPC-based plugins on shared libraries is certainly a non-starter for my use case.Given that it does not appear that Wasm-based Wasmtime plugins is something people are interested in at this time, I'll take a step back and just go ahead and close this PR, instead replacing it by my original proposal: https://github.com/bytecodealliance/rfcs/pull/40
I've outlined an example approach in https://github.com/bytecodealliance/rfcs/pull/39#issuecomment-2451778905, which would let prevent UB in using wasi-dl.
All primitive types are read and written via resource methods in wasi-dl https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L68-L108, so a size mismatch is not possible for non-fixed size integers - the runtime knows the sizes of C primitives at compile time, and if the component would try to write 16-bit char using a u32, it would get an error from set-u32
The component can query the primitive sizes at runtime using sizeof: https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L124How does Wasmtime know what type signature that adapter.so needs?
Structs are also fully supported by libffi: https://www.chiark.greenend.org.uk/doc/libffi-dev/html/Size-and-Alignment.html
There are edge cases where even two C compilers for the same platform disagree on the right ABI. Libffi can not know which ABI to use in those cases.
Given that it does not appear that Wasm-based Wasmtime plugins is something people are interested in at this time, I'll take a step back and just go ahead and close this PR, instead replacing it by my original proposal: https://github.com/bytecodealliance/rfcs/pull/40
I personally would still love to see dylib based plugins that directly interface with wasm interface types, but RPC based plugins are also nice. While they would almost certainly be a bit slower, they would be easier to support for other wasm engines that can't support dlopen and would be much easier to sandbox at an OS level.
Last updated: Nov 22 2024 at 17:03 UTC