fitzgen opened PR #43 from fitzgen:compile-time-builtins to bytecodealliance:main:
Add support for defining builtin host functions at compile-time. Because these functions are early-bound at compile-time -- rather than late-bound at instantiation-time, like regular imports -- the Wasm's compilation can be specialized for these exact imports, enabling inlining without just-in-time compilation, for example. This is the rough equivalent of the [js-string-builtins proposal][js-string-builtins] but for Wasmtime's API and Wasmtime embedding environments rather than JavaScript's
WebAssemblyAPI and JavaScript execution environments.[js-string-builtins]: https://github.com/WebAssembly/js-string-builtins/blob/main/proposals/js-string-builtins/Overview.md
Rendered
Thanks for writing this up -- I think this will be a really powerful ability once we have it, and will be extremely important for certain kinds of applications!
When I was originally mulling over this design space for the zero-copy buffer use-case, I had been imagining something like a raw CLIF interface, but I agree that that's got a lot of downsides and is pretty much fully subsumed by the other options. I'm happy to see we're moving toward the "just define the logic in Wasm" idea (and also not the "special sublanguage" idea, though I liked that when we talked about it too) -- this cleans up a lot of duplication.
I think my input here comes in two major lines:
I suspect that the ability to make slow-path calls is going to be essential to many of the real use-cases we have imagined for this feature (certainly for zero-copy buffers: they have the "append and maybe grow" behavior that
Vecdoes, so a mutable API will need this, and even read-only accessors will have more complex paths for e.g. moving to the next rope segment that may or may not be practical to write inline). So I think thisShould we allow self-hosted Wasm to import and call non-intrinsic functions?
might not be punt-able as
I think this is something we will want eventually, but I'd like to get something basic working first before tackling this more-complicated use case.
if we want to use the thing in our embedding.
I find myself going back and forth on whether the compile-time-builtin Wasm module should be "special" (with constraints as this RFC describes) or more general and which is actually more in the spirit of the standards -- I'll address this one first as I think it addresses the first question if we resolve it another way.
Preliminarily: yes 100% to
We must not deviate from the WebAssembly language semantics.
and also to the more subjective "let's not encourage people to use a nonstandard extension" (and to clarify to any readers: yes that concern is still in scope even if we are literally standards-compliant by presenting function imports, because a sufficiently tempting set of function imports can become a de-facto standard).
However I find myself wondering whether we shouldn't do something a little more general and provide
- The load/store imports this RFC defines, and
- The ability to fix a module import (of any general Wasm module) while compiling another Wasm module, and implement some (simple one-level?) inlining from that module, with none of the other restrictions this RFC names.
I'll call this the "privileged adapter module" approach.
On the "negative motivation" side first (against special Wasm subset for "self-hosted Wasm"): defining restrictions on which subset of Wasm modules is acceptable for a compile-time-builtin module could be seen as restricting/subsetting the standard somewhat. I can absolutely see the logic for it (as the RFC says, discouraging general use, but also in particular: it's simpler if we don't have a VMContext for the special module that is separate from the module that is using it) but isn't it also defining a "Wasm-prime" in the other direction?
On the "positive side" now (for fully general Wasm for "self-hosted Wasm"): the spirit of virtualization and precedent elsewhere (e.g., WASI virtualization, and also places that we talk about adapter modules, such as the debug RFC) suggests to me at least that there isn't too much danger in defining a more privileged interface (here: "load/store host memory") and then allowing any general Wasm module to be an adapter module that provides a higher-level "safe" service on top of it. (In fact we know of folks doing this in production with components.) It is still standard Wasm -- it just has a particular API available to it, and one that the embedder must enable for a specific module. If someone wanted to implement this "peek/poke API" today as ordinary host functions, they could; we are saying that we recognize the need for it because we want to move more logic into Wasm for inlinability. Philosophically, I think the notion that we can never have privileged interfaces imported into a Wasm module (because someone might write Wasm that always requires the privileged interfaces and misuse the specialized environment as a general environment) sits less well with me: it says that Wasm is somehow not universal, and can't be used to implement some parts of the system.
Said another way: "root privilege" (arbitrary load/store) then virtualization is more or less directly aligned, I think, with how capability systems are supposed to work. The idea is that the danger is in plumbing the wrong capabilities to the wrong modules, and safety requires us to put the right access-filtering or -subsetting modules inline with powerful capabilities. But this is already true, and we already trust our embedders to "wire things up right" because one can write arbitrary hostcall implementations or grant the wrong pre-opens or whatnot. Right now in Wasmtime I think we haven't seen this situation much because we have host-native filtering of most of the privileges we grant (e.g. WASI APIs) but I think there's nothing fundamental about that.
And finally, if we restrict ourselves to (i) these load/store intrinsics, provided when configured to a privileged adapter module, and (ii) early binding of this adapter module in a way that enables inlining, then all of the open design questions are addressed, as far as I can tell:
It addresses the "should we allow self-hosted Wasm to {call non-intrinsic functions, have tables for indirect calls, have memories, ...}" questions definitely: yes, it's Wasm, so it has function calls, tables, memories, etc. (As an optimization, if it has none of these, perhaps it doesn't need its own VMContext?)
It addresses the question of how to encapsulate "slow-path functions" and which functions are available: they can be ordinary imports to the adapter module, and not provided to the main module.
It potentially addresses the Winch question: Winch can compile the adapter module normally (no inlining needed); we only need to implement the intrinsics or provide polyfills as imported hostcalls.
Anyway -- all strong opinions, relatively weakly held -- happy to discuss further!
I don't have a strong opinion on whether we need two-level inlining (intrinsics -> self-hosted -> component) or just a single level (intrinsics -> component) — @cfallin is already saying some of the things I was thinking. But I do want to point out how much inlining we're doing and make a plug for making that easier. In either case, we want to inline some CLIF instructions for each intrinsic, right? I'm not sure I caught where the intrinsics were to be specified (are they proposed additions to the component model?) but, in any case, I was imagining there would be some compiler code that converted a call to these special imports into some CLIF, like we currently do for CM built-ins and trampolines (?). And this is what I was hoping could become easier: I know you were kind of discarding the first idea, "exposing CLIF", but it seems helpful for this kind of problem: we tell the compiler "here are the CLIF instructions for calls to this import". I understood from your RFC the danger of misuse, so perhaps it should not be a public, embedder-accessible API, but just having an easy way to inline intrinsics could make it easier to pursue the self-hosted functions?
saulecabrera submitted PR review.
saulecabrera created PR review comment:
I believe the Winch topic was (briefly?) discussed during the Wasmtime 06-05 meeting. However, I wanted to mention that I agree with the no-inlining direction for Winch; even though I believe that some sort of inlining is feasible, especially for the types of functions outlined in this RFC, I'm also very confident that doing so, has the potential to against Winch's simplicity and compilation performance design principles.
fitzgen submitted PR review.
fitzgen created PR review comment:
Makes total sense, thanks for weighing in.
@abrown
I don't have a strong opinion on whether we need two-level inlining (intrinsics -> self-hosted -> component) or just a single level (intrinsics -> component) — @cfallin is already saying some of the things I was thinking. But I do want to point out how much inlining we're doing and make a plug for making that easier. In either case, we want to inline some CLIF instructions for each intrinsic, right? I'm not sure I caught where the intrinsics were to be specified (are they proposed additions to the component model?) but, in any case, I was imagining there would be some compiler code that converted a call to these special imports into some CLIF, like we currently do for CM built-ins and trampolines (?). And this is what I was hoping could become easier: I know you were kind of discarding the first idea, "exposing CLIF", but it seems helpful for this kind of problem: we tell the compiler "here are the CLIF instructions for calls to this import". I understood from your RFC the danger of misuse, so perhaps it should not be a public, embedder-accessible API, but just having an easy way to inline intrinsics could make it easier to pursue the self-hosted functions?
Yes, there will be two kinds of inlining:
- The compile-time builtins need access to intrinsics for reading/writing native memory, and these instrinsics need to be inlined to meet our performance goals. This will happen with a very ham-fisted approach during Wasm-to-CLIF translation where we immediately turn calls to these intrinsics into the relevant CLIF instrucitons.
- We need to inline the compile-time builtins into the Wasm application that imports them. Depending on the approach we take, and the constraints we put on the shape of compile-time builtins, this could be done with the same ham-fisted approach used for (1). My thinking recently, reflected in the last Wasmtime meeting's discussion but not yet in this RFC, has been to instead make a more general inliner-as-a-library kind of thing for Cranelift that still allows Cranelift embedders to drive overall compilation like they do today but provides hooks for them to do inlining and use their own inlining heuristics. This would allow us to also do things like inlining calls across components and their fused adapters.
saulecabrera edited PR review comment.
fitzgen updated PR #43.
Heads up: I've updated this RFC with a more-concrete proposal after the discussion in this issue and at various Cranelift and Wasmtime meetings.
I've also implemented general function inlining for Wasmtime in https://github.com/bytecodealliance/wasmtime/pull/11283. Right now, that is useful for inlining calls between components (including their generated adapters) but with compile-time builtins can also be reused to inline the definitions of compile-time builtins into their callers. And we shouldn't really have to do anything special to get that inlining, it should just happen For Free when inlining is enabled. (I do expect we will need to tweak the inlining heuristics as time goes on, but that is a separate discussion.)
I think there is just one last open question we need to resolve before moving forward: exactly how to implement the
resource.addressintrinsic (or something similar / equally powerful).Please take a look at the updated RFC and let me know what you think and any ideas you have for that last open question!
programmerjake submitted PR review.
programmerjake created PR review comment:
you could use LLVM intrinsics as inspiration and have the import's name tell you which type and resources to use, e.g. have
resource.address.t1for a resource in table1or something.
Okay so I actually had flawed assumptions with the way that resources work in the component model, and this nicely nullifies that last open question.
The resource-definer gets to use an arbitrary
u32as their internal representation of a resource. Resource tables are only involved for other components, and their handles to that resource. The resource-definer always gets access to theiru32representation directly.So given all that, the
u32resource representation can be an index into some embedder-defined table in theTin aStore<T>and compile-time builtins can inline accesses to those embedder-defined tables if we give them an intrinsic likestore.data_addressthat returns a*mut T(and the embedder'sTisrepr(C)).Will update the RFC shortly.
fitzgen updated PR #43.
Will update the RFC shortly.
RFC updated accordingly.
I think that resolves all open questions for this RFC. I will give it a little bit of time for some more discussion before officially starting the motion to merge, just to give people an extra chance to read the updated RFC first and provide feedback.
alexcrichton submitted PR review.
programmerjake submitted PR review.
programmerjake created PR review comment:
you also have to watch out for primitive type alignment changing offsets, e.g. in
#[repr(C)] struct S(u8, f64)thef64field is at offset4oni686-unknown-linux-gnubut at offset8onx86_64-unknown-linux-gnu
fitzgen submitted PR review.
fitzgen created PR review comment:
That's true and I expect we will need to add some wording to our documentation for this feature around what exactly the safety conditions will need to be, such that the compile-time builtins can portably access the host data. My plan right now is to document these things in detail as I prototype and dog-food the feature.
Okay, I think this is ready and we've had a bit of time for folks to look it over, so let's officially start the process.
Motion to Finalize
Disposition: Merge
As always, details on the RFC process can be found here: https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close
alexcrichton commented on PR #43:
I second! (don't think I can re-approve)
As there has been sign-off from representatives of two different BA stakeholder organizations, this RFC is now entering its 10-day
Final Comment Period
and the last day to raise concerns before this RFC merges is 2025-08-16.
Thanks everyone!
cfallin submitted PR review:
LGTM!
cfallin created PR review comment:
This seems to be in conflict with the later section describing intrinsics as always taking
u64addresses (and ignoring the upper 32 bits on 32-bit targets). I like the latter a lot more and it seems nice to provide this portability boon, so perhaps this is just outdated text?
fitzgen submitted PR review.
fitzgen created PR review comment:
Ah yes, sorry this is outdated text, will update momentarily.
fitzgen updated PR #43.
The final comment period has passed without any concerns being raised, so I will go ahead and merge this. Thanks again, everyone!
fitzgen merged PR #43.
Last updated: Dec 06 2025 at 06:05 UTC