Stream: general

Topic: Componentize the world! `(WIT, WAVE, wRPC)`


view this post on Zulip raskyld (Dec 06 2024 at 08:04):

Hello everyone,

I am studying the state of the art of tools around there to serialise structured data on the wire in a cross-platform/language way (protobuf, capnproto, flatbuffers).

I really like flatbuffer and consider using it in one of my side-project but I also found out about wRPC and WAVE.

I am not really fluent with our ABI, but I understand lower and lift as serialize and deserialize respectively. (I find their name a bit confusing, is it a term used a lot in related literature?). So in theory, if we implement libraries in different language, for different platform to do lifting and lowering, we basically get there, right?

All the tools I mentioned have their own IDL, a corresponding serialisation / deserialisation format, and eventually, they propose a default RPC implementation.

So we have a tuple (IDL, serialisation format, RPC) which, in our case could be (WIT, WAVE, wRPC).
So at the end of the day, it's all about packaging such a tuple in a user-friendly way and extending wave to be cross-language and cross-platform right?

I think it would be beneficial for our eco-system to have a well-defined tuple like that, with tools to support its usage, indeed: that would allow incremental introduction of Wasm Components in legacy environments made of native / POSIX software. :tada:

Even performance wise, if wave supports features like zero-copy and our end-users use inter-process (or even intra-process, e.g., from host to components instance but I guess the ABI already covers that.) transport, then Wasm Component adoption becomes even simpler, we can even design fully-native systems using this tuple of technology for their API, so they can interact nicely with Wasm Components (what I am trying to do, btw).

I know wRPC is kind of following this path since it's completely usable out of any wasm runtime, but in the current form,
value encoding doesn't seem standardised. IIUC, wave existed before the Value Definition spec, and, anyway, Value Definitions is not a general-purpose encoding specification, but more focused on the use-case of global values in a Wasm Component context.

Finally, my questions are:

  1. Would it make sense to start working on packaging such a tuple (IDL, serialisation format, RPC) ?
  2. Does (WIT, WAVE, wRPC) make sense, specifically is WAVE the good tool for that? How it relates to the ABI?
  3. Should we make WAVE have its own repository and be end-users entrypoint? What I mean by that is that all the 3 technologies I mentioned are known to the wide-audience by their serialisation format most of the time (i.e., protobuf, capnproto, flatbuffers).

Thanks for taking the time reading me :praise:

Wasm component-native RPC framework. Contribute to bytecodealliance/wrpc development by creating an account on GitHub.
CLI and Rust libraries for low-level manipulation of WebAssembly modules - bytecodealliance/wasm-tools
Repository for design and specification of the Component Model - WebAssembly/component-model

view this post on Zulip Ralph (Dec 06 2024 at 08:13):

wit-bindgen has json support for wit as well, to make it much easier for languages.

view this post on Zulip raskyld (Dec 06 2024 at 08:16):

wit-bindgen is focused on the generation of the bindings specifically right? so we still a tool to do the serde right?

view this post on Zulip raskyld (Dec 06 2024 at 08:17):

(sorry for dumb questions, I just try to get a really high-level overview of all the moving parts)

view this post on Zulip Dan Gohman (Dec 06 2024 at 14:37):

I am not really fluent with our ABI, but I understand lower and lift as serialize and deserialize respectively. (I find their name a bit confusing, is it a term used a lot in related literature?). So in theory, if we implement libraries in different language, for different platform to do lifting and lowering, we basically get there, right?

Protobufs, flatbuffers, etc. serialize to a single bytestream. The terms lifting and lowering in Wasm components describe serializing to a calling convention, which is similar, but has some important differences. For example, lifting and lowering don't store everything in bytes in memory; some of the data is transmitted as call arguments and return values instead. For another example, data in memory isn't in a single contiguous byte array; it's in buffers pointed to by pointers.

So we have a tuple (IDL, serialisation format, RPC) which, in our case could be (WIT, WAVE, wRPC).
So at the end of the day, it's all about packaging such a tuple in a user-friendly way and extending wave to be cross-language and cross-platform right?

WAVE and wRPC both encode values with Wit types, so they're both cross-language and cross-platform.

I know wRPC is kind of following this path since it's completely usable out of any wasm runtime, but in the current form,
value encoding doesn't seem standardised. IIUC, wave existed before the Value Definition spec, and, anyway, Value Definitions is not a general-purpose encoding specification, but more focused on the use-case of global values in a Wasm Component context.

WAVE is a human-focused text format. The Value Definition spec is a binary format. Both are general-purpose encoding specifications.

Finally, my questions are:

  1. Would it make sense to start working on packaging such a tuple (IDL, serialisation format, RPC) ?

It could. It depends on whether there are people volunteering to do this particular packaging.

One thing to consider is that an alternative to all of this is to build adapters between WIT and protobuf/capnproto/flatbuffers. It's a tradeoff; you'd get less direct component integration, but more compatibility with established ecosystems. Different use cases will want different things.

  1. Does (WIT, WAVE, wRPC) make sense, specifically is WAVE the good tool for that? How it relates to the ABI?

I expect that particular tuple isn't what you're looking for here, because WAVE is a text format, so it's not particularly compact or efficient. (WIT, something based on Value Definition encoding, wRPC) is closer, and that's what wRPC basically already is. I suggest looking at wRPC to see if it's perhaps already one of the things you want.

  1. Should we make WAVE have its own repository and be end-users entrypoint? What I mean by that is that all the 3 technologies I mentioned are known to the wide-audience by their serialisation format most of the time (i.e., protobuf, capnproto, flatbuffers).

The Value Definition encoding doesn't currently have its own personality, tooling, and a short name, like protobuf/etc. all have. And, the Value Definition encoding doesn't currently have a design for schema evolution. For example, if someone adds new optional fields to records in their WIT, how can old data be read by new consumers? But those are all things that could be worked on, if someone were interested.

wit-bindgen is focused on the generation of the bindings specifically right? so we still a tool to do the serde right?

Right; wit-bindgen is focused on the calling convention space, rather than the single-bytestream space.

CLI and Rust libraries for low-level manipulation of WebAssembly modules - bytecodealliance/wasm-tools
Repository for design and specification of the Component Model - WebAssembly/component-model

view this post on Zulip raskyld (Dec 06 2024 at 17:20):

Thanks a lot for your comprehensive answer :praise:

So the bindings generated know where in the linear memory they need to place stuff (call parameters) and where to expect return values, etc.. I guess I need to throughly read the Canonical ABI to get all the nuances.

WAVE and wRPC both encode values with Wit types, so they're both cross-language and cross-platform.

What I mean is that we actually need to write the encode/decode libraries in different languages and for different targets right?

One thing to consider is that an alternative to all of this is to build adapters between WIT and protobuf/capnproto/flatbuffers.

That sounds like a really interesting approach, indeed, we could use a proven technology for that.

and that's what wRPC basically already is. I suggest looking at wRPC to see if it's perhaps already one of the things you want.

Yeah I started looking at wRPC and saw it used Value Definition but I was wondering if we could decouple the two things, so Value Definition gets its own personality as you said!

The Value Definition encoding doesn't currently have its own personality, tooling, and a short name, like protobuf/etc. all have. And, the Value Definition encoding doesn't currently have a design for schema evolution. For example, if someone adds new optional fields to records in their WIT, how can old data be read by new consumers? But those are all things that could be worked on, if someone were interested.

I think there is two path for that:

  1. Evolving WIT to support API evolution, but then, there is the underlying question of: is WIT supposed to have such a feature?
  2. Avoiding touching WIT and instead working on adapters with other serialisation format.

For now, I don't know which path make the most sense so I guess I will wait to see what people from the Zulip think to see if we reach a consensus of what should be our approach or if the base idea is interesting or not.

view this post on Zulip Dan Gohman (Dec 06 2024 at 17:58):

WAVE and wRPC both encode values with Wit types, so they're both cross-language and cross-platform.

What I mean is that we actually need to write the encode/decode libraries in different languages and for different targets right?

Right.

  1. Evolving WIT to support API evolution, but then, there is the underlying question of: is WIT supposed to have such a feature?

WIT will surely need to address API evolution at some point, considering how important it is in other similar systems, such as protobufs.

view this post on Zulip raskyld (Dec 06 2024 at 19:37):

Hm.. anyway I guess that if we ever want to explore option (2) i.e. interoperability with other serialization format, we would need feature parity i.e. making WIT support API evolution / fields tagged with integers instead of using names etc..

view this post on Zulip raskyld (Dec 11 2024 at 08:59):

Hej!

I have started on my part-time (I will open-source it once I get the basics done) a pluggable WIT compiler.

Basically, the library takes a pipeline defined as multiple "steps" which are just Wasm Components capable of taking a WIT AST as an input and either manipulate it or translate it to AST of programming languages so a last component in the pipeline ends-up writing actual Rust / Go / Python / Whatever code. I had to implement the equivalent of reflection e.g. for protobuf to manipulate WIT package in a WebAssembly runtime.

I realised that whatever path I choose, I will end-up needing a flexible WIT compiler to generate bindings and adaptors to serialisation formats.

Accidentally, the crate could be useful to generate markdown / single HTML page to document WIT packages.

Are you aware of anything like that in the eco-system?

Protocol Buffers - Google's data interchange format - protocolbuffers/protobuf

view this post on Zulip Roman Volosatovs (Dec 11 2024 at 09:31):

One thing to note is that WIT provides a superset of features available in e.g. protobuf. While one could figure out a way to encode a resource - it seems like future and stream types, which will be available very soon in WASI 0.3, would posses a greater challenge. All that is to say that encoding any, arbitrary, generic WIT value to a flat byte buffer is not possible. E.g. if a specific interface allows stream buffering - one could encode it as a list<T>, but that's not an assumption that can be made in a generic way.

wRPC was designed to specifically enable async and Component Model encoding was effectively extracted from the original implementation. wRPC still needs to extend that encoding to define representation of resource, future and stream.

view this post on Zulip raskyld (Dec 11 2024 at 09:43):

I have been considering those cases, that is what drove me to first build this WIT compiler.
I am not yet settled on the exact serialisation format, we could totally translate WIT types in Flatbuffers Tables but, as you mentioned, WIT has features which put constraints on the transport layer that will be in use. That also means, the exact way some concepts are represented must be customisable.

For example, you can represent resources as an uint64 which has meaning only in the context of a persistent connection between a client and a server at RPC level.

Maybe you want to go one way further, and the RPC protocol you will end-up using is capable to go further and issue cryptographic handles to clients which can be used to invoke method on a resource without persistent connection, etc. etc..
Cap'n'proto have support for capabilities and promises: https://capnproto.org/rpc.html

My current mental picture is that the serialisation format MUST allow end-users to produce different schema for a same WIT concept (e.g. how are represented resources, streams, and promise on the wire).

view this post on Zulip raskyld (Dec 11 2024 at 09:52):

The way I plan to address that is by having different world (or plugin type) in the compiler:

  1. A pure WIT to WIT AST processors,
  2. WIT AST to Language-specific AST (LSA) processors (e.g. WIT to Go AST),
  3. LSA to LSA processors (this is where, we could specify / override the representation of a resource on the wire).
  4. LSA to filesystem outputters.

So RPC vendors could release plugins (3) just to specialise the representation of advanced concepts.

Ofc, that's just ideation and I will likely have to address challenges but guess I need to try first :shrug:


Last updated: Dec 23 2024 at 12:05 UTC