Stream: wit-bindgen

Topic: Native plugins defined in WIT


view this post on Zulip Christof Petig (Feb 10 2024 at 10:05):

I have been using WIT to define language-neutral interfaces between native code for several months now and consider it a worthwhile extension of its use case.

Typically I just use unmodified guest code compiled to a shared library by the native compiler and restrict the exported symbols to the small set also exported from my wasm component by using a linker script.

So the binary interface remains identical between wasm and native with two exceptions:

So I would like to propose to agree on some conventions to make this use case more viable:

I wil glady provide patches for bindgen (or even the standards docs) if we can agree on a convention here.

Of course there is no host bindgen compatible with this at the moment, I added a --direct flag to C++ bindgen which will target this use case, but Rust and C would be other interesting host languages.

PS: At runtime we were able to transparently choose between native compiled plugins, wasm2c AoT compiled wasm or wamr interpreted wasm, by just pointing to a different .so with the same interface.

PPS: Ways to natively express components composed of multiple core modules (likely each its shared object) will require future design work.

I need to make changes to the "c_header.rs" file because there are numerous instances where "size_t" is mistakenly identified as "uint32_t," which is incompatible with wasm64.

view this post on Zulip Christof Petig (Feb 10 2024 at 10:53):

I just found idxtype in the memory64 proposal, so that might be a good type name proposal for wit as well. (I think CHERI compatibility is not high-priority for WIT at the moment)

view this post on Zulip Alex Crichton (Feb 12 2024 at 16:08):

I think it's reasonable to start out at least by using size_t or usize as such for pointer-sized things rather than a 32-bit integer. I'd be wary to generalize too much further into a full-on native plugin system, however. There's a number of semantics about components which do not map well to "there's only a symbol with an implicit ABI" such as:

I wouldn't say that WIT/components are a perfect fit for a native plugin system. While WIT and such can be adapted and I realize that it's an attractive target the design trajectory of WIT/components is to be for components/wasm/etc and not to take into account native plugins as well. That brings up a whole host of questions, as you're running into here.

Overall I want to definitely acknowledge how using WIT for a native plugin system is attractive, and I also want to agree that the changes you're suggesting are definitely welcome insofar as they're furthering hypothetical 64-bit support in the future. I do want to avoid going "too far" though, but what that means depends on context. In that sense I'm happy to review changes as they come, for example using pointer-sized types I think is reasonable.

view this post on Zulip Dan Gohman (Feb 12 2024 at 16:36):

Relatedly, I've been exploring the idea of using source-language pointer types in the bindings, such as *mut c_void for the Rust bindings, in order to represent provenance.

Pointers (this includes values of reference type) in Rust have two components. The pointer's "address" says where in memory the pointer is currently pointing. The pointer's "provenance" says where...

view this post on Zulip Alex Crichton (Feb 12 2024 at 16:45):

I also agree that'd be a good idea to switch to!

view this post on Zulip Christof Petig (Feb 12 2024 at 23:48):

The canonical options and compose layout (if I understand correctly this information is stored outside the core module and not specified in WIT but in e.g. wac) could map to a surrounding shared object doing the copying/conversion between imported .so's - so the mechanism could apply to native components as well, and the glue .so could be auto-generated from the same information.

But this would be a far future goal for native plugins.

Do you have any preference on the naming? Is idxtype indead the best name candidate for a new WIT primvaltype? I tend towards the shorter but still unambiguous XNN encoding for names, e.g. testX3AexampleX2FmyX2DinterfaceX2EX5BmethodX5DmyX2DobjectX2Eset, perhaps - should be mapped to _ instead of X2D.

view this post on Zulip Christof Petig (Feb 12 2024 at 23:54):

or more visually creative testBexampleZmy_interfaceOCmethodJmy_objectOset :wink: (leveraging that WIT symbols are all lowercase)

view this post on Zulip Alex Crichton (Feb 12 2024 at 23:57):

Oh I'm not sure we need a new type in WIT, instead just updates to the lifting and lowering to use a pointer sized abstraction based on the type of memory. WIT itself doesn't deal with pointers ever at the type layer, just at the ABI layer.

For naming we will need to be somewhat careful to reserve a namespace for intrinaics and such like those used for resources. Otherwise though I'm hesitant to make this "too standard" given my thoughts about so just about anything seems fine

view this post on Zulip Christof Petig (Feb 13 2024 at 06:38):

Alex Crichton said:

Oh I'm not sure we need a new type in WIT, instead just updates to the lifting and lowering to use a pointer sized abstraction based on the type of memory. WIT itself doesn't deal with pointers ever at the type layer, just at the ABI layer.

You are right, I have only seen the need in two non standard use cases:

Wasi thread spawn https://github.com/WebAssembly/wasi-threads?tab=readme-ov-file#api-walk-through (mapping this to wasm64 failed exactly at lacking this datatype) and when emulating returning an object via reference (I will add the zulip discussion link later).

The second case is very common for not-in-depth-adapted-to-wit APIs which are more common with native interfaces.

Contribute to WebAssembly/wasi-threads development by creating an account on GitHub.

view this post on Zulip Christof Petig (Feb 13 2024 at 06:42):

https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Component.20Model.3A.20Passing.20Large.20Buffers

view this post on Zulip Alex Crichton (Feb 13 2024 at 15:36):

Ah yeah for threads that's definitely going to be a "werid" one and it basically can't be modelled in WIT. Streams are sort of similar where building up streams is difficult to model in WIT which is why it needs native support in the component model.

To expand a bit on why I'm hesitant to add a new type to WIT, that means adding it to the component model as well. WIT can't be its own IDL in isolation but it's very closely tied to the component model, and at that abstraction layer the meaning of a pointer-sized integer type doesn't mean much (e.g. a component could have multiple modules all with different sizes of memories inside of it)

view this post on Zulip Christof Petig (Feb 13 2024 at 15:59):

I just took a closer look at the iceoryx2 API, e.g. see the use of sample at https://github.com/eclipse-iceoryx/iceoryx2?tab=readme-ov-file#publish-subscribe, which is very similar to my use case, basically a resource object provides temporary (writable) access to a memory buffer, which is sent or released by dropping (or passing ownership over to a send method) of the resource.

So being able to (exclusively) borrow a list<u8> from a resource method would solve the need for pointers for me, the lifetime would be tied to the resource (handle). Of course the host would have to do the majority of the work to make this zero copy communication reality, I suspect that multi memory will be next on my implementation list.

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust - eclipse-iceoryx/iceoryx2

view this post on Zulip Kyle Gray (Feb 13 2024 at 23:48):

Is there a built in type for maps? I'm currently using a list of tuples instead.

view this post on Zulip Dan Gohman (Feb 13 2024 at 23:49):

There is not, currently, and a list of tuples is indeed a common substitute.

view this post on Zulip Notification Bot (Feb 13 2024 at 23:52):

Kyle Gray has marked this topic as resolved.

view this post on Zulip Christof Petig (Feb 14 2024 at 06:26):

I don't think this thread had a conclusion on how to map buffer borrowing to wit.

view this post on Zulip Notification Bot (Feb 14 2024 at 06:27):

Christof Petig has marked this topic as unresolved.

view this post on Zulip Christof Petig (Feb 22 2024 at 21:15):

@Dan Gohman I just found your work adding Pointer types to WasmType. It looks really promising.

Also I saw some usage of usize in Rust wit-bindgen (guest based resource code) and first thought that it might be related, but looking into the git log I guess this hasn't changed, yet.

Getting Rust guest code compatible with compiling to native has jumped up a bit in my priority list, but I just found that my understanding of guest resource handling was wrong: The host maintains the address to id list using canon.new/rep/drop and passes the rep (likely an object address) as the first parameter to methods, if I am not mistaken. Until today I thought that the host would pass the guest provided integer id as the first argument and the guest would have to look up the rep/address in its list.

For the naming of native symbols I have settled on fooX3AfooX2FrecordsX23tuple_arg, so the symbols are visibly as well as reversibly encoded. For guest imported functions I use X00as the separator between module and name.

view this post on Zulip Christof Petig (Feb 25 2024 at 21:51):

I started using the pointer types in wit-bindgen and I see that it is a larger effort to make all languages support 64 bit compatible pointer types (I mostly stubbed other languages for now).

But I also used it to create an initial working prototype of calling native plugins defined in WIT (C++ guest+host for now because I know this generator best and hand-patching the limitations of the generator was most easy). It uses the strings.wit interface from the codegen tests and exercises both directions (the last one fails in valgrind, to be investigated).

Nevertheless if you want to take a look: https://github.com/cpetig/wit-bindgen/tree/wasm64/crates/cpp/tests/native_strings . It already contains the bindgen generated but for now hand-patched sources.

$ objdump -T libstrings.so

libstrings.so:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      D  *UND*  0000000000000000  Base        fooX3AfooX2FstringsX00b
0000000000000000  w   DF *UND*  0000000000000000 (GLIBC_2.2.5) __cxa_finalize
0000000000000000      D  *UND*  0000000000000000  Base        fooX3AfooX2FstringsX00a
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) abort
0000000000000000      D  *UND*  0000000000000000  Base        fooX3AfooX2FstringsX00c
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.4)  __stack_chk_fail
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) free
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) realloc
0000000000000000      DF *UND*  0000000000000000 (CXXABI_1.3) __gxx_personality_v0
0000000000000000  w   D  *UND*  0000000000000000  Base        _ITM_deregisterTMCloneTable
0000000000000000      DF *UND*  0000000000000000 (GCC_3.0)    _Unwind_Resume
0000000000000000  w   D  *UND*  0000000000000000  Base        __gmon_start__
0000000000000000  w   D  *UND*  0000000000000000  Base        _ITM_registerTMCloneTable
00000000000014f6 g    DF .text  00000000000000b1  Base        fooX3AfooX2FstringsX23b
00000000000015a7  w   DF .text  0000000000000040  Base        cabi_post_fooX3AfooX2FstringsX23b
00000000000011f9  w   DF .text  0000000000000052  Base        cabi_realloc
00000000000015e7 g    DF .text  000000000000017d  Base        fooX3AfooX2FstringsX23c
0000000000001764  w   DF .text  0000000000000040  Base        cabi_post_fooX3AfooX2FstringsX23c
000000000000144c g    DF .text  00000000000000aa  Base        fooX3AfooX2FstringsX23a

view this post on Zulip Christof Petig (Feb 25 2024 at 22:22):

PS: Issues 8 to 13 in that repository track the remaining code generation mistakes for this PoC.

view this post on Zulip Christof Petig (Mar 07 2024 at 23:09):

Update: I was able to implement the same interface in a Rust shared object (guest), but it still depends on several future bindgen patches (string length offset and ret_area size) to work correctly. For now I hand-corrected the generated code and then it works well.

view this post on Zulip Christof Petig (Mar 16 2024 at 16:26):

Update: The string code is now working flawlessly without hand patching, with a Rust guest and a C++ "host".

But I encountered a strange problem: With

package foo:foo;

interface resources {
    resource r {
        constructor(a: u32);
        add: func(b: u32);
    }
    create: func() -> r;
    borrows: func(o: borrow<r>);
    consume: func(o: r);
}

world the-world {
  import resources;
  export resources;
}

the Rust guest function consume expects to receive the index instead of the rep, while jco passes the rep (as I would expect). Am I correct that this is a bug in the Rust code generator?

I tried fixing it but quickly found that passing the resource R to the consume function isn't right - because it no longer has an index on the host side, so it can no longer be an _rt::Resource<R>. So passing the bare user defined object to the trait function consume seems like the most reasonable way.

Also do I assume right that a function consuming a type doesn't need to call [resource-drop]on the index afterwards?

view this post on Zulip Christof Petig (Mar 16 2024 at 16:30):

This is my current assumption about the correct way to fix the Rust code:

--- a/crates/cpp/tests/native_resources/rust/src/the_world.rs
+++ b/crates/cpp/tests/native_resources/rust/src/the_world.rs
@@ -294,14 +294,14 @@ pub mod exports {
 }
 #[doc(hidden)]
 #[allow(non_snake_case)]
-pub unsafe fn _export_consume_cabi<T: Guest>(arg0: i32,) {#[cfg(target_arch="wasm32")]
-_rt::run_ctors_once();T::consume(R::from_handle(arg0 as u32));
+pub unsafe fn _export_consume_cabi<T: Guest>(arg0: *mut u8,) {#[cfg(target_arch="wasm32")]
+_rt::run_ctors_once();T::consume(_rt::Box::<_RRep<T::R>>::from_raw(arg0.cast()).unwrap());
 }
 pub trait Guest {
   type R: GuestR;
   fn create() -> R;
   fn borrows(o: RBorrow<'_>,);
-  fn consume(o: R,);
+  fn consume(o: Self::R,);
 }
 pub trait GuestR: 'static {

@@ -366,7 +366,7 @@ macro_rules! __export_foo_foo_resources_cabi{
     }
     #[cfg_attr(target_arch = "wasm32", export_name = "foo:foo/resources#consume")]
     #[cfg_attr(not(target_arch = "wasm32"), no_mangle)]
-    unsafe extern "C" fn fooX3AfooX2FresourcesX23consume(arg0: i32,) {
+    unsafe extern "C" fn fooX3AfooX2FresourcesX23consume(arg0: *mut u8,) {
       $($path_to_types)*::_export_consume_cabi::<$ty>(arg0)
     }

view this post on Zulip Alex Crichton (Mar 18 2024 at 14:36):

No in that case it's intentional that consume takes an index, it's only borrows-of-exported-resources that receive a pointer

view this post on Zulip Christof Petig (Jun 25 2024 at 21:38):

It all started with a question by a co-worker: "Could shared-everything components directly link together without a host-side connection?"

My first reaction was negation, but then I gave it more thoughts. Some minutes later I answered "it might be possible with a modified ABI".

So I selected a more symmetrical calling convention, taking the argument encoding from guest imported calls and the result encoding from guest exported calls (both don't pass ownership but just provide a view into the memory). For resources I selected the guest imported flavor. Applying this to both imported and exported interfaces made the ABI symmetric and directly link-able. This way a module only reads from its communication partner's memory, never frees it or writes to it. A new host runtime could even use this modified ABI to connect shared-nothing modules.

The API to the guest language is unchanged. Using future caller provided buffer APIs could eliminate another (local) heap allocation.

I feel for shared everything uses like native plugins and highly optimized embedded this could be a viable simpler alternative, but it clearly is a new incompatible ABI. You can find my experiments at https://github.com/cpetig/wit-bindgen/tree/main/crates/cpp/tests/meshless_strings and https://github.com/cpetig/wit-bindgen/tree/main/crates/cpp/tests/meshless_resources.

PS: Of course if more than one module offers an interface you need to rename/prefix it (see the a_ prefix in the strings example)

A (C++) language binding generator for WebAssembly interface types - cpetig/wit-bindgen
A (C++) language binding generator for WebAssembly interface types - cpetig/wit-bindgen

view this post on Zulip Christof Petig (Jul 13 2024 at 09:30):

Christof Petig said:

I feel for shared everything uses like native plugins and highly optimized embedded this could be a viable simpler alternative, but it clearly is a new incompatible ABI.

Said co-worker asked whether it would be feasible to use usize for resource ids and thus get rid of the (need for the) conversion table. It worked out fine and the code generation for C++ is finished, Rust is next on my list.

The most interesting property is that the distinction between host and guest vanishes and also that you can directly link several components into a single binary. The API is still unchanged, but the type asymmetry between calling methods and being called hurts the developer experience.

Thus I closely follow the caller provided buffers proposal which would remove the need to allocate buffers inside the callee from the caller, which is the root cause for the asymmetric API.

Asynchronous ABI additions (wasi 0.3) is another area we will need to investigate into.

Please keep in mind that this is simply a different ABI encoding, you can bridge between those two and also use the symmetrical ABI with shared nothing or wasm, given a runtime which knows how to decode it.

Side note: Using this ABI with wasm would require cabi_realloc to also free if the new size is zero and cloning resource handles isn't yet standardized, thus clone() for result types can't be autogenerated.


Last updated: Dec 23 2024 at 12:05 UTC