Stream: git-wasmtime

Topic: wasmtime / PR #8737 feat(rpc): add `wasmtime_rpc` crate


view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2024 at 16:04):

rvolosatovs edited PR #8737.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2024 at 16:05):

rvolosatovs updated PR #8737.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2024 at 10:38):

rvolosatovs updated PR #8737.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2024 at 10:41):

rvolosatovs updated PR #8737.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 10 2024 at 19:51):

alexcrichton submitted PR review:

Thanks again for your patience while I found time to get to review this. At a high level this all looks good modulo minor comments here or there but I wanted to drill in in some more of the details before we land this.

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

I'm writing down this idea with the purpose of explicitly saying that this implementation as-is is debt that we're accruing and don't have a plan to pay it down. Debt I mean in th sense of leaning on a known-explicitly-inefficient abstraction Val without a concrete plan on improving it. It's not necessarily required that this is paid down immediately but I do think it's important to consider this at the same time.

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Overall my current feeling is that I'm at least personally not understanding the long-term of this. I think it makes sense to add to wasmtime-the-CLI but it feels like an overly complex implementation for that use case. As-is it feels like this could suitably be an external crate to guide some API changes in Wasmtime itself, but I also realize you probably have more long-term goals for this being in-tree so I also think it would be good to get those written down to.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2024 at 10:26):

rvolosatovs commented on PR #8737:

To make sure we're on the same page: this PR is a work-in-progress PoC at best, really just opened to align on the direction here, I would not expect it to be ready-for-review for another week at least. I've just recently reworked the transport abstraction, and a few changes, I think, are still coming. It is my absolute intention to add a simple to follow end-to-end example to this PR (most probably using QUIC transport I've just finished yesterday https://github.com/wrpc/wrpc/pull/127) before marking it as ready-for-review. I'm just now starting with updating the bindgen to a few changes made in value definition encoding and the new transport - that's the biggest blocker for now.

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

The intention here is to, given an off-the-shelf wrpc_transport::Invoke implementation, likely contained within https://github.com/wrpc/wrpc (BA transfer pending), provide developers the tools to satisfy component imports of their choosing at runtime. Eventually I'd like to also add functionality to serve exports given a wrpc_transport::Serve implementation. The key benefit here is the standardized, generic traits (the Invoke and Serve) and the associated collection of fully-specified transports. I was a bit hesitant adding CLI support in this PR since I did not want to prematurely flesh out the UX of that, but since that feature would be feature gated and opt-in anyway (at least to start with), I guess I will just go ahead and add a simple CLI support for this in this PR as well. This PR will not be considered ready until there's both a "low-level", library use example and CLI usage example available (or documented).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

Absolutely agree, in fact I was always hoping that at one point we could make the runtime be "wRPC-aware" and directly convert to/from canon ABI values to "value definition" values. I believe, for some of these, the conversion is just an identity function, e.g. a bool encoding is exactly the same for both.
Regarding the traits, my (rough) suggestion for wRPC value trait analogues for this can be found here https://github.com/wrpc/wrpc/issues/101#issuecomment-2150525683

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Like mentioned above, this is still a PoC and the interface will change slightly, that said, these are the two biggest complexity sources I've identified:

interface example {
   // reads bytes from rx and writes them to each stream in tx
   fn tee(rx: input-stream, tx: list<output-stream>) -> result<u64>
}

tee contract requires tx to be received before full contents of rx are available. The rx field would be encoded as option<list<u8>> in the first, "sync" pass - if full contents of rx are not available at encoding time, it would be sent as option::none, while a "deferred" writer of the underlying stream<u8> would be registered with index 0 (first parameter). tx would be fully encoded "synchronously". After the "sync" parameters are sent, the implementation would concurrently be sending more bytes of rx and await the result<u64>.

There is no resource support in PR currently, so deferred is not used indeed, I am happy to add that in this PR - was hoping to keep the scope small, but I don't mind adding more things here if that's desired.

So to answer your question, neither "just" AsyncRead/AsyncWrite, nor async fn(&Context, &[u8]) -> Result<Vec<u8>> could actually satisfy the requirements here without breaking the contracts in general case, for two reasons:

  1. values may not fit in memory and may even not even be "finite" in some sense (e.g. if the input-stream is an HTTP body stream or a TCP stream)
  2. values may be arbitrarily nested (e.g. a record { foo: stream<stream<stream<vec<stream<future<u8>>>>>> }). Wasmtime would need to implement it's own framing of some kind to differentiate different chunks of the async values, which I believe is not desired, because:

All that said, I'm happy to add a fully-synchronous Invoke trait to this crate with blanket impl for wrpc_transport::Invoke, which would also bind the error types to be wasmtime::Error and use that in bounds removing the async support from this PR for now. That would mean that this trait would require breaking changes later on, but perhaps it would be easier to align on the async design in a follow-up PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2024 at 10:30):

rvolosatovs edited a comment on PR #8737:

To make sure we're on the same page: this PR is a work-in-progress PoC at best, really just opened to align on the direction here, I would not expect it to be ready-for-review for another week at least. I've just recently reworked the transport abstraction, and a few changes, I think, are still coming. It is my absolute intention to add a simple to follow end-to-end example to this PR (most probably using QUIC transport I've just finished yesterday https://github.com/wrpc/wrpc/pull/127) before marking it as ready-for-review. I'm just now starting with updating the bindgen to a few changes made in value definition encoding and the new transport - that's the biggest blocker for now.

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

The intention here is to, given an off-the-shelf wrpc_transport::Invoke implementation, likely contained within https://github.com/wrpc/wrpc (BA transfer pending), provide developers the tools to satisfy component imports of their choosing at runtime. Eventually I'd like to also add functionality to serve exports given a wrpc_transport::Serve implementation. The key benefit here is the standardized, generic traits (the Invoke and Serve) and the associated collection of fully-specified transports. I was a bit hesitant adding CLI support in this PR since I did not want to prematurely flesh out the UX of that, but since that feature would be feature gated and opt-in anyway (at least to start with), I guess I will just go ahead and add a simple CLI support for this in this PR as well. This PR will not be considered ready until there's both a "low-level", library use example and CLI usage example available (or documented).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

Absolutely agree, in fact I was always hoping that at one point we could make the runtime be "wRPC-aware" and directly convert to/from canon ABI values to "value definition" values. I believe, for some of these, the conversion is just an identity function, e.g. a bool encoding is exactly the same for both.
Regarding the traits, my (rough) suggestion for wRPC value trait analogues for this can be found here https://github.com/wrpc/wrpc/issues/101#issuecomment-2150525683

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Like mentioned above, this is still a PoC and the interface will change slightly, that said, these are the two biggest complexity sources I've identified:

interface example {
   // reads bytes from rx and writes them to each stream in tx
   fn tee(rx: input-stream, tx: list<output-stream>) -> result<u64>
}

tee contract requires tx to be received before full contents of rx are available. The rx field would be encoded as option<list<u8>> in the first, "sync" pass - if full contents of rx are not available at encoding time, it would be sent as option::none, while a "deferred" writer of the underlying stream<u8> would be registered with index 0 (first parameter). tx would be fully encoded "synchronously". After the "sync" parameters are sent, the implementation would concurrently be sending more bytes of rx and await the result<u64>.

There is no resource support in PR currently, so deferred is not used indeed, I am happy to add that in this PR - was hoping to keep the scope small, but I don't mind adding more things here if that's desired.

So to answer your question, neither "just" AsyncRead/AsyncWrite, nor async fn(&Context, &[u8]) -> Result<Vec<u8>> could actually satisfy the requirements here without breaking the contracts in general case, for two reasons:

  1. values may not fit in memory and may even not even be "finite" in some sense (e.g. if the input-stream is an HTTP body stream or a TCP stream), in fact, the input async value could depend on the output async value, so blocking here is not acceptable
  2. values may be arbitrarily nested (e.g. a record { foo: stream<stream<stream<vec<stream<future<u8>>>>>> }). Wasmtime would need to implement it's own framing of some kind to differentiate different chunks of the async values, which I believe is not desired, because:

All that said, I'm happy to add a fully-synchronous Invoke trait to this crate with blanket impl for wrpc_transport::Invoke, which would also bind the error types to be wasmtime::Error and use that in bounds removing the async support from this PR for now. That would mean that this trait would require breaking changes later on, but perhaps it would be easier to align on the async design in a follow-up PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2024 at 10:33):

rvolosatovs edited a comment on PR #8737:

To make sure we're on the same page: this PR is a work-in-progress PoC at best, really just opened to align on the direction here, I would not expect it to be ready-for-review for another week at least. I've just recently reworked the transport abstraction, and a few changes, I think, are still coming. It is my absolute intention to add a simple to follow end-to-end example to this PR (most probably using QUIC transport I've just finished yesterday https://github.com/wrpc/wrpc/pull/127) before marking it as ready-for-review. I'm just now starting with updating the bindgen to a few changes made in value definition encoding and the new transport - that's the biggest blocker for now.

Note, that I will also add a more formal Wasmtime RFC for this

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

The intention here is to, given an off-the-shelf wrpc_transport::Invoke implementation, likely contained within https://github.com/wrpc/wrpc (BA transfer pending), provide developers the tools to satisfy component imports of their choosing at runtime. Eventually I'd like to also add functionality to serve exports given a wrpc_transport::Serve implementation. The key benefit here is the standardized, generic traits (the Invoke and Serve) and the associated collection of fully-specified transports. I was a bit hesitant adding CLI support in this PR since I did not want to prematurely flesh out the UX of that, but since that feature would be feature gated and opt-in anyway (at least to start with), I guess I will just go ahead and add a simple CLI support for this in this PR as well. This PR will not be considered ready until there's both a "low-level", library use example and CLI usage example available (or documented).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

Absolutely agree, in fact I was always hoping that at one point we could make the runtime be "wRPC-aware" and directly convert to/from canon ABI values to "value definition" values. I believe, for some of these, the conversion is just an identity function, e.g. a bool encoding is exactly the same for both.
Regarding the traits, my (rough) suggestion for wRPC value trait analogues for this can be found here https://github.com/wrpc/wrpc/issues/101#issuecomment-2150525683

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Like mentioned above, this is still a PoC and the interface will change slightly, that said, these are the two biggest complexity sources I've identified:

interface example {
   // reads bytes from rx and writes them to each stream in tx
   fn tee(rx: input-stream, tx: list<output-stream>) -> result<u64>
}

tee contract requires tx to be received before full contents of rx are available. The rx field would be encoded as option<list<u8>> in the first, "sync" pass - if full contents of rx are not available at encoding time, it would be sent as option::none, while a "deferred" writer of the underlying stream<u8> would be registered with index 0 (first parameter). tx would be fully encoded "synchronously". After the "sync" parameters are sent, the implementation would concurrently be sending more bytes of rx and await the result<u64>.

There is no resource support in PR currently, so deferred is not used indeed, I am happy to add that in this PR - was hoping to keep the scope small, but I don't mind adding more things here if that's desired.

So to answer your question, neither "just" AsyncRead/AsyncWrite, nor async fn(&Context, &[u8]) -> Result<Vec<u8>> could actually satisfy the requirements here without breaking the contracts in general case, for two reasons:

  1. values may not fit in memory and may even not even be "finite" in some sense (e.g. if the input-stream is an HTTP body stream or a TCP stream), in fact, the input async value could depend on the output async value, so blocking here is not acceptable
  2. values may be arbitrarily nested (e.g. a record { foo: stream<stream<stream<vec<stream<future<u8>>>>>> }). Wasmtime would need to implement it's own framing of some kind to differentiate different chunks of the async values, which I believe is not desired, because:

All that said, I'm happy to add a fully-synchronous Invoke trait to this crate with blanket impl for wrpc_transport::Invoke, which would also bind the error types to be wasmtime::Error and use that in bounds removing the async support from this PR for now. That would mean that this trait would require breaking changes later on, but perhaps it would be easier to align on the async design in a follow-up PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 21 2024 at 14:55):

alexcrichton commented on PR #8737:

Sorry for the delay in getting back to this, but this all sounds reasonable enough to me. I think it'd be best to go an RFC route on this as it sounds like you're already intending to do where there can be a better understanding of the high-level goals and directions of this.

For example I would not want to take on all the complexity of planning for stream<T> and future<T> at this time. I'd rather defer that to later. I would, however, find it useful to see what the plan for resources will be. Furthermore I think the basic performance profile of this will guide the implementation (e.g. Val-or-not, all-in-memory or not, etc) pretty significantly so I think it would be best to settle these sorts of high-level details through that.


Last updated: Nov 22 2024 at 16:03 UTC