I have been using WIT to define language-neutral interfaces between native code for several months now and consider it a worthwhile extension of its use case.
Typically I just use unmodified guest code compiled to a shared library by the native compiler and restrict the exported symbols to the small set also exported from my wasm component by using a linker script.
So the binary interface remains identical between wasm and native with two exceptions:
s32
for string and list addresses only work on 32 bit machines (rare these days)test:example/my-interface.[method]my-object.set
)So I would like to propose to agree on some conventions to make this use case more viable:
pointer
), this would also help with wasm64, e.g. https://github.com/WebAssembly/wasi-libc/pull/444 intptr_t
/ size_t
(C) or * const c_void
usize
(Rust has known issues with CHERI where these two differ in size) for string and list lowering, this has zero effect on wasm32 and prepares for wasm64_
(C bindgen), 0xNN (wasm2c) or XNN (w2c2), Rust guest bindgen sadly just discards the module nameI wil glady provide patches for bindgen (or even the standards docs) if we can agree on a convention here.
Of course there is no host bindgen compatible with this at the moment, I added a --direct
flag to C++ bindgen which will target this use case, but Rust and C would be other interesting host languages.
PS: At runtime we were able to transparently choose between native compiled plugins, wasm2c AoT compiled wasm or wamr interpreted wasm, by just pointing to a different .so with the same interface.
PPS: Ways to natively express components composed of multiple core modules (likely each its shared object) will require future design work.
I just found idxtype
in the memory64 proposal, so that might be a good type name proposal for wit as well. (I think CHERI compatibility is not high-priority for WIT at the moment)
I think it's reasonable to start out at least by using size_t
or usize
as such for pointer-sized things rather than a 32-bit integer. I'd be wary to generalize too much further into a full-on native plugin system, however. There's a number of semantics about components which do not map well to "there's only a symbol with an implicit ABI" such as:
realloc
most likely.list<T>
, which requires runtime coordination to achieve this.I wouldn't say that WIT/components are a perfect fit for a native plugin system. While WIT and such can be adapted and I realize that it's an attractive target the design trajectory of WIT/components is to be for components/wasm/etc and not to take into account native plugins as well. That brings up a whole host of questions, as you're running into here.
Overall I want to definitely acknowledge how using WIT for a native plugin system is attractive, and I also want to agree that the changes you're suggesting are definitely welcome insofar as they're furthering hypothetical 64-bit support in the future. I do want to avoid going "too far" though, but what that means depends on context. In that sense I'm happy to review changes as they come, for example using pointer-sized types I think is reasonable.
Relatedly, I've been exploring the idea of using source-language pointer types in the bindings, such as *mut c_void
for the Rust bindings, in order to represent provenance.
I also agree that'd be a good idea to switch to!
The canonical options and compose layout (if I understand correctly this information is stored outside the core module and not specified in WIT but in e.g. wac) could map to a surrounding shared object doing the copying/conversion between imported .so's - so the mechanism could apply to native components as well, and the glue .so could be auto-generated from the same information.
But this would be a far future goal for native plugins.
Do you have any preference on the naming? Is idxtype
indead the best name candidate for a new WIT primvaltype? I tend towards the shorter but still unambiguous XNN encoding for names, e.g. testX3AexampleX2FmyX2DinterfaceX2EX5BmethodX5DmyX2DobjectX2Eset
, perhaps -
should be mapped to _
instead of X2D
.
or more visually creative testBexampleZmy_interfaceOCmethodJmy_objectOset
:wink: (leveraging that WIT symbols are all lowercase)
Oh I'm not sure we need a new type in WIT, instead just updates to the lifting and lowering to use a pointer sized abstraction based on the type of memory. WIT itself doesn't deal with pointers ever at the type layer, just at the ABI layer.
For naming we will need to be somewhat careful to reserve a namespace for intrinaics and such like those used for resources. Otherwise though I'm hesitant to make this "too standard" given my thoughts about so just about anything seems fine
Alex Crichton said:
Oh I'm not sure we need a new type in WIT, instead just updates to the lifting and lowering to use a pointer sized abstraction based on the type of memory. WIT itself doesn't deal with pointers ever at the type layer, just at the ABI layer.
You are right, I have only seen the need in two non standard use cases:
Wasi thread spawn https://github.com/WebAssembly/wasi-threads?tab=readme-ov-file#api-walk-through (mapping this to wasm64 failed exactly at lacking this datatype) and when emulating returning an object via reference (I will add the zulip discussion link later).
The second case is very common for not-in-depth-adapted-to-wit APIs which are more common with native interfaces.
Ah yeah for threads that's definitely going to be a "werid" one and it basically can't be modelled in WIT. Streams are sort of similar where building up streams is difficult to model in WIT which is why it needs native support in the component model.
To expand a bit on why I'm hesitant to add a new type to WIT, that means adding it to the component model as well. WIT can't be its own IDL in isolation but it's very closely tied to the component model, and at that abstraction layer the meaning of a pointer-sized integer type doesn't mean much (e.g. a component could have multiple modules all with different sizes of memories inside of it)
I just took a closer look at the iceoryx2 API, e.g. see the use of sample at https://github.com/eclipse-iceoryx/iceoryx2?tab=readme-ov-file#publish-subscribe, which is very similar to my use case, basically a resource object provides temporary (writable) access to a memory buffer, which is sent or released by dropping (or passing ownership over to a send method) of the resource.
So being able to (exclusively) borrow a list<u8> from a resource method would solve the need for pointers for me, the lifetime would be tied to the resource (handle). Of course the host would have to do the majority of the work to make this zero copy communication reality, I suspect that multi memory will be next on my implementation list.
Is there a built in type for maps? I'm currently using a list of tuples instead.
There is not, currently, and a list of tuples is indeed a common substitute.
Kyle Gray has marked this topic as resolved.
I don't think this thread had a conclusion on how to map buffer borrowing to wit.
Christof Petig has marked this topic as unresolved.
@Dan Gohman I just found your work adding Pointer types to WasmType. It looks really promising.
Also I saw some usage of usize in Rust wit-bindgen (guest based resource code) and first thought that it might be related, but looking into the git log I guess this hasn't changed, yet.
Getting Rust guest code compatible with compiling to native has jumped up a bit in my priority list, but I just found that my understanding of guest resource handling was wrong: The host maintains the address to id list using canon.new/rep/drop and passes the rep (likely an object address) as the first parameter to methods, if I am not mistaken. Until today I thought that the host would pass the guest provided integer id as the first argument and the guest would have to look up the rep/address in its list.
For the naming of native symbols I have settled on fooX3AfooX2FrecordsX23tuple_arg
, so the symbols are visibly as well as reversibly encoded. For guest imported functions I use X00
as the separator between module and name.
I started using the pointer types in wit-bindgen and I see that it is a larger effort to make all languages support 64 bit compatible pointer types (I mostly stubbed other languages for now).
But I also used it to create an initial working prototype of calling native plugins defined in WIT (C++ guest+host for now because I know this generator best and hand-patching the limitations of the generator was most easy). It uses the strings.wit interface from the codegen tests and exercises both directions (the last one fails in valgrind, to be investigated).
Nevertheless if you want to take a look: https://github.com/cpetig/wit-bindgen/tree/wasm64/crates/cpp/tests/native_strings . It already contains the bindgen generated but for now hand-patched sources.
$ objdump -T libstrings.so
libstrings.so: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 D *UND* 0000000000000000 Base fooX3AfooX2FstringsX00b
0000000000000000 w DF *UND* 0000000000000000 (GLIBC_2.2.5) __cxa_finalize
0000000000000000 D *UND* 0000000000000000 Base fooX3AfooX2FstringsX00a
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) abort
0000000000000000 D *UND* 0000000000000000 Base fooX3AfooX2FstringsX00c
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.4) __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) free
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) realloc
0000000000000000 DF *UND* 0000000000000000 (CXXABI_1.3) __gxx_personality_v0
0000000000000000 w D *UND* 0000000000000000 Base _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 (GCC_3.0) _Unwind_Resume
0000000000000000 w D *UND* 0000000000000000 Base __gmon_start__
0000000000000000 w D *UND* 0000000000000000 Base _ITM_registerTMCloneTable
00000000000014f6 g DF .text 00000000000000b1 Base fooX3AfooX2FstringsX23b
00000000000015a7 w DF .text 0000000000000040 Base cabi_post_fooX3AfooX2FstringsX23b
00000000000011f9 w DF .text 0000000000000052 Base cabi_realloc
00000000000015e7 g DF .text 000000000000017d Base fooX3AfooX2FstringsX23c
0000000000001764 w DF .text 0000000000000040 Base cabi_post_fooX3AfooX2FstringsX23c
000000000000144c g DF .text 00000000000000aa Base fooX3AfooX2FstringsX23a
PS: Issues 8 to 13 in that repository track the remaining code generation mistakes for this PoC.
Update: I was able to implement the same interface in a Rust shared object (guest), but it still depends on several future bindgen patches (string length offset and ret_area size) to work correctly. For now I hand-corrected the generated code and then it works well.
Update: The string code is now working flawlessly without hand patching, with a Rust guest and a C++ "host".
But I encountered a strange problem: With
package foo:foo;
interface resources {
resource r {
constructor(a: u32);
add: func(b: u32);
}
create: func() -> r;
borrows: func(o: borrow<r>);
consume: func(o: r);
}
world the-world {
import resources;
export resources;
}
the Rust guest function consume
expects to receive the index instead of the rep, while jco passes the rep (as I would expect). Am I correct that this is a bug in the Rust code generator?
I tried fixing it but quickly found that passing the resource R to the consume function isn't right - because it no longer has an index on the host side, so it can no longer be an _rt::Resource<R>
. So passing the bare user defined object to the trait function consume
seems like the most reasonable way.
Also do I assume right that a function consuming a type doesn't need to call [resource-drop]
on the index afterwards?
This is my current assumption about the correct way to fix the Rust code:
--- a/crates/cpp/tests/native_resources/rust/src/the_world.rs
+++ b/crates/cpp/tests/native_resources/rust/src/the_world.rs
@@ -294,14 +294,14 @@ pub mod exports {
}
#[doc(hidden)]
#[allow(non_snake_case)]
-pub unsafe fn _export_consume_cabi<T: Guest>(arg0: i32,) {#[cfg(target_arch="wasm32")]
-_rt::run_ctors_once();T::consume(R::from_handle(arg0 as u32));
+pub unsafe fn _export_consume_cabi<T: Guest>(arg0: *mut u8,) {#[cfg(target_arch="wasm32")]
+_rt::run_ctors_once();T::consume(_rt::Box::<_RRep<T::R>>::from_raw(arg0.cast()).unwrap());
}
pub trait Guest {
type R: GuestR;
fn create() -> R;
fn borrows(o: RBorrow<'_>,);
- fn consume(o: R,);
+ fn consume(o: Self::R,);
}
pub trait GuestR: 'static {
@@ -366,7 +366,7 @@ macro_rules! __export_foo_foo_resources_cabi{
}
#[cfg_attr(target_arch = "wasm32", export_name = "foo:foo/resources#consume")]
#[cfg_attr(not(target_arch = "wasm32"), no_mangle)]
- unsafe extern "C" fn fooX3AfooX2FresourcesX23consume(arg0: i32,) {
+ unsafe extern "C" fn fooX3AfooX2FresourcesX23consume(arg0: *mut u8,) {
$($path_to_types)*::_export_consume_cabi::<$ty>(arg0)
}
No in that case it's intentional that consume
takes an index, it's only borrows-of-exported-resources that receive a pointer
It all started with a question by a co-worker: "Could shared-everything components directly link together without a host-side connection?"
My first reaction was negation, but then I gave it more thoughts. Some minutes later I answered "it might be possible with a modified ABI".
So I selected a more symmetrical calling convention, taking the argument encoding from guest imported calls and the result encoding from guest exported calls (both don't pass ownership but just provide a view into the memory). For resources I selected the guest imported flavor. Applying this to both imported and exported interfaces made the ABI symmetric and directly link-able. This way a module only reads from its communication partner's memory, never frees it or writes to it. A new host runtime could even use this modified ABI to connect shared-nothing modules.
The API to the guest language is unchanged. Using future caller provided buffer APIs could eliminate another (local) heap allocation.
I feel for shared everything uses like native plugins and highly optimized embedded this could be a viable simpler alternative, but it clearly is a new incompatible ABI. You can find my experiments at https://github.com/cpetig/wit-bindgen/tree/main/crates/cpp/tests/meshless_strings and https://github.com/cpetig/wit-bindgen/tree/main/crates/cpp/tests/meshless_resources.
PS: Of course if more than one module offers an interface you need to rename/prefix it (see the a_
prefix in the strings example)
Christof Petig said:
I feel for shared everything uses like native plugins and highly optimized embedded this could be a viable simpler alternative, but it clearly is a new incompatible ABI.
Said co-worker asked whether it would be feasible to use usize for resource ids and thus get rid of the (need for the) conversion table. It worked out fine and the code generation for C++ is finished, Rust is next on my list.
The most interesting property is that the distinction between host and guest vanishes and also that you can directly link several components into a single binary. The API is still unchanged, but the type asymmetry between calling methods and being called hurts the developer experience.
Thus I closely follow the caller provided buffers proposal which would remove the need to allocate buffers inside the callee from the caller, which is the root cause for the asymmetric API.
Asynchronous ABI additions (wasi 0.3) is another area we will need to investigate into.
Please keep in mind that this is simply a different ABI encoding, you can bridge between those two and also use the symmetrical ABI with shared nothing or wasm, given a runtime which knows how to decode it.
Side note: Using this ABI with wasm would require cabi_realloc to also free if the new size is zero and cloning resource handles isn't yet standardized, thus clone() for result types can't be autogenerated.
Last updated: Nov 22 2024 at 17:03 UTC