Stream: wasmtime

Topic: Shared memory and components


view this post on Zulip Erik S. (Feb 27 2024 at 16:32):

Do I understand correctly that the Component Model essentially takes different Wasm programs and statically links them into a single module?
And that this single module is run as a single module instance (single isolate), which is why it is possible to pass references between components?

And that it will not be possible for different module instances to share memory, by passing references in Wasm as in the Component Model, in Wasmtime?

view this post on Zulip Pat Hickey (Feb 27 2024 at 16:36):

wasm modules are the unit of code defined in the "core" wasm spec - a component can contain 0 or more modules, but typically one.

view this post on Zulip Peter Huene (Feb 27 2024 at 16:37):

In the component model, a component may contain multiple modules inside of it that share a linear memory (and a shared "libc-like" implementation to manage that memory across the different modules); however, there is no "static" linking in the sense that the modules are linked together into a single module (technically nothing is stopping a sophisticated tool from rewriting the modules into a single module, though); they remain separate modules that link together via imports and exports

view this post on Zulip Peter Huene (Feb 27 2024 at 16:39):

many of the supported languages currently produce components that have a single linear memory and a single implementation module in the component; but some (notably python) use multiple modules; the isolation boundary in the component model is the component as there is no way for a component to share a linear memory with another component

view this post on Zulip Pat Hickey (Feb 27 2024 at 16:40):

wasm modules are just a unit of code - their imports and exports can expose any part of their internals (functions, globals, or memory) to other modules. components can instantiate a module or set of modules by wiring up those core imports and exports to each other. but when components are composed with other components, it uses a distinct type system from the core wasm functons, globals, and memory - instead that type system deals with records, enums, lists, strings, resources (a foreign reference type) and other high-level types. those component level types are converted to and from (we use the words lifting and lowering) the module's way of representing values in its linear memory through what we call the canonical abi

view this post on Zulip Pat Hickey (Feb 27 2024 at 16:43):

so when you pass a string from a module inside one component to a module inside another component, there are instructions in the first component describing how to lift the string out of the first linear memory, and then instructions in the second component describing how to lower the string into the second linear memory. (for a string these instructions - called canon opts - say what the string encoding is, and which args are the pointer and length to the string being lifted, and how to call an allocator to write the string being lowered). a component runtime can look at the pair of lifting and lowering of a binding and come up with the most efficient way to implement it - if the string is encoded the same on both ends, its a memcpy, otherwise its a transcode between utf8 and utf16 for example

view this post on Zulip Pat Hickey (Feb 27 2024 at 16:44):

and finally for references theres a similar process for lifting and lowering across components. local to a module, a reference is just an i32 index. the canonical abi has a way for a module to pass that to another component, where it gets stuck into a component-level table that is opaque to the module during the lift, and then translated (lowered) to a i32 that is unique in the callee's module in the callee component

view this post on Zulip Pat Hickey (Feb 27 2024 at 16:45):

thats almost certainly more detail than you required, though.

view this post on Zulip Erik S. (Feb 27 2024 at 18:19):

Thanks, I appreciate the detail, and it helps to correct my misconceptions.

I thought that the reference would allow the other module to read from the first module's memory, but I understand now that it's just essentially a plain old number for internal bookkeeping, e.g.:

B would use module A by:
A.get_items() => list of references
for each reference: A.get_item(reference)

I thought it would be more similar to passing a slice, and that the native compiler would allow cross-module memory access by capability principle of possessing the slice.
And that the main benefit of the component model would be to combine modules, to achieve zero cost isolation, inside a shared isolate to allow shared memory, while still having isolation guarantees by bytecode correctness.

Since I am incorrect, do I understand correctly that the Component Model is not zero-cost, that there will be a lot of copying, either when passing values, or when first passing references, and then using those references to retrieve values (that are then copied)?
I suppose there's some cost efficiency when passing through multiple modules A->B->C, but only if C can retrieve the value directly from A? (which would require that B could bind both A and C together, in it, which I assume is not possible, that there are only imports/exports and no "passthrough" bindings?).

I hope I'm making sense, maybe I have to think about it some more :)

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:23):

yes, thats correct - communication between components is for the "shared nothing" case of mutually distrusting components, and it is not zero cost. resources (references) are pretty cheap to share between components but values such as a list of bytes will incur a copy. (one day, when the component model integrates with a matured wasm-gc, we may be able to have gc languages share immutable references to a list of bytes across component boundaries, but thats very much hypothetical right now and may take years to deliver, and it would only work if both sides are wasm-gc representations, there would need to be a copy if either/both are linear memory representations)

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:25):

with regards to A importing B, and B importing C, then afaik there is no way to express a resource that implemented in C that is used directly by A - B would have to implement some sort of proxy on that resource, which if some method call on that resource was returning lists of bytes, would incur additional copies.

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:26):

but i'm not as 100% confident on that as i am about the rest of my thoughts here, maybe @Luke Wagner can check my math

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:28):

also worth noting that the CM as it exists today is basically an MVP - if we end up learning about concrete cases where we need to express passthrough like you are thinking, thats something we can look into adding so that CM implementations can optimize out the intermediate copies. there are a lot of areas for possible optimization we havent explored yet because we are just trying to start using it and see what we learn

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:29):

if you are looking for some way to share references between modules that do trust each other, thats something where you might be able to use the CM to express how those modules compose/instantiate together inside a single component, which wont have the same restrictions

view this post on Zulip Dan Gohman (Feb 27 2024 at 18:29):

It's also worth thinking about the bigger picture; if you have a large quantity of data that's going A->B->C: if B is doing non-trivial work on that data, it'll often be the case that copying the data isn't a major bottleneck compared to the actual work that B is doing. And on the other hand, if B is just forwarding data along, then we have a variety of ways to avoid copying the data into and out of B such as splice or using handles.

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:30):

ah yeah if you were to use wasi streams to express moving data between then then you can think of the wasi stream resources as present in some outer component (the host, or maybe implemented virtually by something else) and all of A, B, and C import from it

view this post on Zulip Pat Hickey (Feb 27 2024 at 18:31):

we believe wasi streams are a stepping stone towards eventually having streams be native to the CM itself so that A, B, C dont have to have the common outer import

view this post on Zulip Dan Gohman (Feb 27 2024 at 18:34):

There's also been serious discussion of adding a copy-on-write mmap from a resource into a linear memory.

view this post on Zulip Dan Gohman (Feb 27 2024 at 18:48):

In the example above, if A.get_items() would return a long list or the items have a lot of data, the best approach might be to not return a list. Instead, perhaps return a handle to a resource that iterates through the items. For example, wasi-filesystem does this with read-dir; it returns a directory-entry-stream where you call read-directory-entry repeatedely to get the items.

view this post on Zulip Dan Gohman (Feb 27 2024 at 18:49):

That way, the handle to the directory-entry-stream can be passed around as much as you want, with no copies.

view this post on Zulip Erik S. (Feb 27 2024 at 19:18):

Thanks, that's very interesting. I've been thinking about this problem a lot. I was worried about doing redundant work, or going in an incompatible direction (both, with regards to the CM). Thanks for clarifying.

WASI streams sound like the right solution, and surprisingly identical to the design I arrived at. When will they be usable with Wasmtime?

My major worry is with memory copying.
I think memory copying speed these days is negligible?
But it could be a problem with size, if everything has to exist twice (in the extreme).
And I'm wondering if serialization time cost would be significant, from a fragmented memory to a serialized/linear/compacted message.
Is that a correct concern?

IIRC, module instances also cannot shrink memory, so a single occurrence of large data could cause the memory to grow permanently.
If A is a file reader, it would only grow to the size of the buffer, when streaming.
B would not read the stream.
C would read the stream into memory, then deallocate, but the memory would never shrink.

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:05):

wasi streams are available today, theyre part of wasi 0.2. https://github.com/WebAssembly/WASI/blob/main/preview2/io/streams.wit
everything in wasi 0.2 is implemented in both wasmtime, and in the jco project for node.js

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:08):

maybe one way to think of components is not as slow function calls between parts of your program, but as much more efficient than microservices. its a pretty imperfect metaphor, but you can think of crossing component boundaries as a similar level of isolation to talking to a program you dont trust, but instead of sending your function parameters across the network to a application running in a distinct hypervisor or hardware or whatever, its all in the same component model implementation, n the same process on the same host machine

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:09):

so, the costs to get the same sort of isolation benefits (plus not cracking open the distributed systems can of worms) are much lower than existing isolation techniques. it is more costly than just calling into library code in your own application, but youd have to trust that library code, and it has to be written in the same language as your application, and so on

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:13):

as far as shrinking linear memory - there is a proposal for core wasm that will allow more granular control over linear memory, including shrinking. it hasnt been active lately, and may take a few years to arrive. since thats about how the core wasm vm works, it has a much larger set of stakeholders than the component model currently does. https://github.com/WebAssembly/memory-control

A proposal to introduce finer grained control of WebAssembly memory. - WebAssembly/memory-control

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:14):

wasi streams are designed to allow the implementation to exert backpressure, to control how much memory is allocated in the callee by the caller passing a large list<u8> to it.

view this post on Zulip Dan Gohman (Feb 27 2024 at 20:14):

And besides shrinking, most source languages use malloc or other memory allocators which can reuse memory. So when they deallocate, the memory becomes available for subsequent allocations.

view this post on Zulip Pat Hickey (Feb 27 2024 at 20:14):

so, if the data is really streaming (doesnt need to all be present at one time to enable some transformation) an implementation has a lot of flexibility to control its footprint

view this post on Zulip Erik S. (Feb 28 2024 at 11:03):

Pat Hickey said:

so, the costs to get the same sort of isolation benefits (plus not cracking open the distributed systems can of worms) are much lower than existing isolation techniques. it is more costly than just calling into library code in your own application, but youd have to trust that library code, and it has to be written in the same language as your application, and so on

The module isolation that I'm looking for is unfortunately much smaller, i.e. isolating the 1k dependencies in a Nodejs project (or a Rust project for that matter). Or the light-weight processes in an Erlang system. Nanoprocesses.

Which is possible without shared memory, but leads to memory redundancy, serialization cost, and runtime duplication. Which is acceptable, but it does look bad in benchmarks, which hurts adoption. That said, computational power and memory size is increasing rapidly (except for cloud users stuck on overpriced and antiquated hardware), and new users may be transitioning from slower languages like Python, and memory-heavy platforms like JVM. So the cost may even out with the gains.

I'm not sure that the microservices metaphor works, or is a useful use case: a microservice is deployed independently, scales independently, horizontal scaling gives high availability, and credentials are configured independently.

view this post on Zulip Pat Hickey (Feb 28 2024 at 17:16):

agreed its a pretty imperfect metaphor. components may not be ideal for extremely fine-grained erlang-like units of isolation at this time. maybe with wasm-gc the cost model would change enough to make it viable, but at this MVP stage you're right that theres perhaps more copying than other designs would permit.

view this post on Zulip Pat Hickey (Feb 28 2024 at 17:18):

i guess the point of my metaphor is that the "mutually distrusting" aspect of component composition is pretty load-bearing. we think that aspect of components is pretty important and valuable, but understand that it may not be in every domain. in erlang, thats not part of the design consideration at all, so they could make some pretty different design choices than components did.

view this post on Zulip Bryton (Feb 29 2024 at 02:50):

Dan Gohman said:

That way, the handle to the directory-entry-stream can be passed around as much as you want, with no copies.

I think the drawbacks of using resource are: 1). Have to code many getters and setters method for a resource handle. 2). The cost of calling getters may equivalent to copies. When a data is copied to return area (stack), the data is usually hot in CPU cache, as it is on the stack, it takes serveral cycles, but the cost is unclear when calling getters, as is need the assist from runtime.

The uncertain thing is that how many times can happen of calling getters via a resource, if the times is low, and the data size a resource occupied is big, possibly using resource has performance benefit.

view this post on Zulip Bryton (Feb 29 2024 at 02:56):

Pat Hickey said:

ah yeah if you were to use wasi streams to express moving data between then then you can think of the wasi stream resources as present in some outer component (the host, or maybe implemented virtually by something else) and all of A, B, and C import from it

Could you talk more about wasi streams? I am sorry to say that I don't understand why wasi streams help to performance, according to the wit definition https://github.com/WebAssembly/WASI/blob/main/preview2/io/streams.wit, take an example of the definition of read in input-stream,

      read: func(
            /// The maximum number of bytes to read
            len: u64
        ) -> result<list<u8>, stream-error>;

the return result contains a list<u8>, which I believe it still incur copies from another component or host to current component.

Please let me know if my understanding is incorrect, thank you very much.

view this post on Zulip Bryton (Feb 29 2024 at 03:22):

Also, would it be possible to add wasm module original writing language information inside a module file? If so, the copy times between modules, component to/from component and component/module to/from runtime may be reduced. Let take an example of wasmtime::component::bindgen, it produces repr(c) data struct according to wit defintions (for example record), I believe that current implementation has at least three times copies: 1). Host side data type(repr(Rust)) to wit type repr(c), 2). wit type repr(c) to linear memory (return area), 3). linear memory to local variable (stack).

But if at the component instantiating, wasmtime can get guest implementation lauguage information, if it is possible to avoid the memory copies incurred by types conversion?

view this post on Zulip fitzgen (he/him) (Feb 29 2024 at 12:55):

Pat Hickey said:

as far as shrinking linear memory - there is a proposal for core wasm that will allow more granular control over linear memory, including shrinking. it hasnt been active lately, and may take a few years to arrive. since thats about how the core wasm vm works, it has a much larger set of stakeholders than the component model currently does. https://github.com/WebAssembly/memory-control

FWIW, this proposal is somewhat stalled at the moment because there is no portable way to implement this proposal's semantics (or any proposed new semantics) such that it is actually faster than memzero on all platforms

view this post on Zulip Erik S. (Feb 29 2024 at 15:55):

@Pat Hickey Thanks for the clarification :)

view this post on Zulip bjorn3 (Mar 01 2024 at 17:12):

It doesn't have to be faster on all platforms, just at least as fast as the alternative, right? And for reducing memory usage there is currently no alternative, not even a slow one.

view this post on Zulip Bryton (Mar 02 2024 at 01:22):

bjorn3 said:

It doesn't have to be faster on all platforms, just at least as fast as the alternative, right? And for reducing memory usage there is currently no alternative, not even a slow one.

May I ask what's your mean of all platforms? Is there any data for reference? Appreciate it.


Last updated: Dec 23 2024 at 13:07 UTC