dicej opened PR #46 from dicej:lower-component to bytecodealliance:main:
Great to see this. Happy to share notes on canonical ABI edge cases from Meld if useful.
cfallin submitted PR review:
This is really important work and I'm happy to see it being developed -- thanks!
cfallin created PR review comment:
This is a really good point and I think it's very important to solve "right", and not just reject components that instantiate a module more than once: that's a fundamental capability of the component model that core Wasm (without metadata/wrapper) doesn't have, and we don't want to bifurcate the ecosystem into components that fit this restriction and those that don't.
Function duplication (your second option) seems conceptually appealing because it hides the complexity, but in practice I suspect a large majority of functions will be duplicated, because almost everything will access memory...
Maybe the best option here is to actually define a "just the module linking, please" subset of the component model semantics that gives (i) a flat index space of core modules, (ii) a wiring diagram instantiating them and connecting imports and exports? The host already has to do some work to provide some intrinsics so this proposal is not "free" in any case; so ingesting such a format should not be too much of an additional sell (though there is certainly a step-function increase from "one core module" to "graph of core modules"). It's also conceptually the cleanest IMHO: this really is a thing that the component semantics can describe that a core Wasm can't, but most core Wasm runtimes should have host APIs to instantiate a thing more than once, so we should just "pass it through".
Just to note it down, though I don't like it: I guess there could be a fourth option here, which is (at a high level) something like "reify the
vmctxas actual Wasm state". That seems to be the most "honest" w.r.t. the lowering paradigm.The idea is that one would reify data structures that look like Wasmtime's instance state as Wasm GC values. A Wasm memory could be an arrayref to an array-of-i8; a Wasm table could be an arrayref to an array-of-whatever. Given those, one could define a
vmctxWasm struct that contains memory refs and table refs as our nativevmctxdoes today, as well as any globals, inlined; then the lowered functions take thisvmctxstruct ref as an implicit first arg.This clearly would have nontrivial runtime overhead as well, since in essence we'd have two levels of indirection for any state access.
jellevandenhooff commented on PR #46:
Bit of a drive-by thought: My guess is that if the lowering tooling is performant enough, any wasm host would want to adopt it, and then the component-model spec splits in two. these new slim host bindings, and the component-model guest bindings as today. Do you think you would end up committing to API stability on the host bindings part? Standardize them? I suspect wasm runtimes would want that.
jellevandenhooff edited a comment on PR #46:
Bit of a drive-by thought: My guess is that if the lowering tooling is performant enough, any wasm host would want to adopt it, and then the component-model spec splits in two: these new slim host bindings and the component-model guest bindings as today. Do you think you would end up committing to API stability on the host bindings part? Standardize them? I suspect wasm runtimes would want that.
Do you think you would end up committing to API stability on the host bindings part? Standardize them?
Yeah, I think for this to work the API would need to at least be "officially" documented in the same way the dylink.0 convention is documented. Ideally, though, the "slim host bindings" API/ABI would just be a subset of the Component Model ABI (e.g. some or all of the
thread.*andcontext.*canonical built-ins) and therefore not need to be documented or standardized separately. I _think_ that should work, in which case the TODO item in theHost C API for Lowered Componentssection will just be to describe the relevant CM ABI built-ins as C function declarations.
dicej submitted PR review.
dicej created PR review comment:
Maybe the best option here is to actually define a "just the module linking, please" subset of the component model semantics that gives (i) a flat index space of core modules, (ii) a wiring diagram instantiating them and connecting imports and exports?
Yeah, I expect this is what it would have to look like. One thought that crossed my mind would be to literally output a real component, but one that only uses the absolute minimum set of features needed to embed, instantiate, and link modules. Hosts would need to be able to parse and instantiate these "simple components" but not need to support the entire component model.
alexcrichton commented on PR #46:
Personally I'm all for reducing complexity as much as we can, and the motivation section of this RFC resonates with me accordingly and I agree it's a worthwhile problem to tackle. At the same time though I'm personally skeptical of this approach in terms of practicality. For example as-written the RFC is currently relatively hand-wavy in terms of what exact responsibilities lie where. I understand though this is a relatively early-stages proposal so it's naturally not going to have anything fully fleshed out on day 1, but nonetheless I want to point out that at least for me it's difficult to form a concrete opinion without having more concrete details.
As a general thrust of "make the component model simpler to implement and make components easier to run", that seems reasonable to have an RFC on-the-record from the BA blessing that approach. For me personally I don't find that too useful because if aspirations are high-level enough it runs the risk of getting agreement amongst lots of folks but being quite difficult to actually make progress.
So, a question for this: is that the purpose of this RFC? To get high-level agreement on the approach? We've done this with some debugging-related RFCs for example as an approach to have implementation details sketched but not fully fleshed out while still maintaining high-level agreement. If that's the goal then I'm happy to approve as-is. If the goal though is to get more in-depth discussion of the technical specifics, viability, etc, that's a pretty different conversation.
I also was thinking the same as @jellevandenhooff when reading over this -- whatever intermediate APIs are needed between the runtime and a lowered component effectively need to end up being standards for this to work (IMO). That raises the bar quite a lot in terms of expected quality and care to design which would be an important point to note.
So, a question for this: is that the purpose of this RFC? To get high-level agreement on the approach?
I opened this as a draft with some TODO items because I indeed wanted to gauge high-level agreement on the approach to begin with, but I also want to get into the details before calling it "done".
BTW, I went into a lot of detail in #38, but a lot of those details changed once we had real-world experience with the implementation. Personally, I think that's fine; the goal here is to be specific and make sure the details are not obviously wrong, but still be able to change things later during implementation as needed.
Anyway, yes, my goal is both high-level consensus and to get into the details as well. I'm aiming to add those by the end of the week, at which point I'll switch this out of draft mode.
alexcrichton commented on PR #46:
Ok makes sense, and yeah I would agree that trying to flesh out all the details up front is probably not worthwhile because of how much will change during the implementation as we get more experience. As the goal here is to be more-detail-oriented-than-high-level-goals, however, some thoughts I'd have on this are:
- A perhaps chief concern of mine is going to be performance/overhead. With native integration/implementation there's a lot of mechanisms to bypass overhead, and for example this change would require that all compoents likely to have at least 2 linear memories (one for the guest, one for the runtime), which baloons 8G of virtual memory to 16G of virtual memory per-component. This is just one example, but I'd be initially wary that we would want to switch everything over in Wasmtime to this paradigm before being more confident in the performance profile, for example.
- Another thing I'd want to be pretty up-front about is that while lowering a component to a core wasm module certainly helps a lot there is still quite a lot of work for a host to do. Here it's under the guise of a bindgen but even just writing a bindgen requires significant effort/maintenance and is not something we can hand-wave away. This is fundamental to a host interacting with a component because somehow core wasm things need to get translated to host things, and this can get significantly complicated in the face of resources, futures, streams, lazy lowering, etc. Basically I don't want to give anyone the impression that this will basically delete 99% of component-model code in Wasmtime or other runtimes, my gut is that it'd be more like 50% in the end.
- Particularly w.r.t. async I don't actually know how a built-in wasm-based runtime could shave off a large chunk of the complexity burden from embedders. Of primary concern here to me is the lack of core wasm stack switching. With stack switching in theory a lot more can be moved to the guest, but without stack switching we're left with JSPI-like approaches which puts quite a lot more on the host. Even still, somehow the host's notion of async needs to be bridged into the wasm concept of async and that will inevitably require a lot of careful design and probably a lot of work on the host.
Overall I think I'm actually relatively skeptical of this approach panning out in the long run. Despite that I do want this endeavor to succeed, however, but my point is that it's going to require signfiicant investment and design to even just evaluate the approach. Personally at least I don't feel like there's a clear way to implement all of this which requires only figuring out some minor details, but rather the unknowns are much larger. In that sense I think it's worthwhile to experiment more here, but to truly feel comfortable about accepting this I'd personally want to see more proof-of-concept style work to flesh out more details about how these fundamentals are going to work and play out in the end
n that sense I think it's worthwhile to experiment more here, but to truly feel comfortable about accepting this I'd personally want to see more proof-of-concept style work to flesh out more details about how these fundamentals are going to work and play out in the end
Yes, agreed that a PoC is needed before we'll really know whether this is (A) feasible, and (B) worth doing. If that means leaving this PR unmerged until the PoC done, it's fine with me. Meanwhile, it's already generated some good discussion and serves as something we can point interested folks too.
dicej edited a comment on PR #46:
In that sense I think it's worthwhile to experiment more here, but to truly feel comfortable about accepting this I'd personally want to see more proof-of-concept style work to flesh out more details about how these fundamentals are going to work and play out in the end
Yes, agreed that a PoC is needed before we'll really know whether this is (A) feasible, and (B) worth doing. If that means leaving this PR unmerged until the PoC done, it's fine with me. Meanwhile, it's already generated some good discussion and serves as something we can point interested folks too.
dicej edited a comment on PR #46:
In that sense I think it's worthwhile to experiment more here, but to truly feel comfortable about accepting this I'd personally want to see more proof-of-concept style work to flesh out more details about how these fundamentals are going to work and play out in the end
Yes, agreed that a PoC is needed before we'll really know whether this is (A) feasible, and (B) worth doing. If that means leaving this PR unmerged until the PoC is done, it's fine with me. Meanwhile, it's already generated some good discussion and serves as something we can point interested folks too.
Anyway, yes, my goal is both high-level consensus and to get into the details as well. I'm aiming to add those by the end of the week, at which point I'll switch this out of draft mode.
I didn't get around to this, but will try to do it early next week.
dicej updated PR #46.
dicej updated PR #46.
I just pushed an update which adds a bunch of detail regarding the proposed APIs.
dicej has marked PR #46 as ready for review.
dicej updated PR #46.
dicej updated PR #46.
dicej updated PR #46.
dicej updated PR #46.
fitzgen submitted PR review.
fitzgen created PR review comment:
I’ll echo Chris’s point here, even though I haven’t seen any disagreement with it: multiple instantiation is a core capability of the CM and we must cover all CM semantics.
I also think the output shouldn’t be a component with a minimal feature subset, because the idea is that we are implementing the CM desugaring for engines that don’t support it, so we shouldn’t assume that they can parse even a subset of it. The output should be a flat list of core modules (including those generated for fused adapters) and a flat list of instantiation and import-export wiring commands. Basically the simplest thing that covers the component model semantics, with no syntax sugar.
@alexcrichton
this change would require that all compoents likely to have at least 2 linear memories (one for the guest, one for the runtime)
Can you clarify why there would need to be a second memory for the runtime? I don’t follow how that would be required.
I agree that it would be a large problem however. I would personally be extremely surprised/concerned if we didn’t use the same number of memories in the desugared core output as were defined and instantiated in the input component.
Can you clarify why there would need to be a second memory for the runtime? I don’t follow how that would be required.
I believe he's referring to this part of the proposal (from the
lower-componentsection):In addition to the generated "fused adapter" code, the output module will
include component model runtime code, separately compiled from Rust source,
which handles, among other things:
- table management for resource and waitable values
- guest-to-guest stream and future I/O
- task and thread bookkeeping
That code will definitely need to allocate, which means it either needs to have its own memory or be able to allocate from another module's memory (e.g. via
cabi_realloc, but note that we may be getting rid of that once lazy lowering arrives).
That code will definitely need to allocate, which means it either needs to have its own memory or be able to allocate from another module's memory (e.g. via
cabi_realloc, but note that we may be getting rid of that once lazy lowering arrives).Also, allocating from the memory of one of the (potentially malicious and/or buggy) modules taken from the input component invites the risks of tampering and information leaks.
The other option to avoid the extra memory is to compile the component runtime into native code and run it in the host instead. The tradeoff there is that it becomes part of the TCB along with all the other host code, but that's probably fine if the component runtime is written in Rust with zero unsafe code. The code would remain runtime-agnostic and thus reusable either way.
dicej updated PR #46.
Last updated: Mar 23 2026 at 16:19 UTC