wasmtime / PR #11825 Implement unsafe intrinsics for comp... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #11825 Implement unsafe intrinsics for comp...

Wasmtime GitHub notifications bot (Oct 08 2025 at 23:33):

fitzgen opened PR #11825 from fitzgen:new-compile-time-builtins to bytecodealliance:main:

This commit adds the extremely unsafe
wasmtime::CodeBuilder::expose_unsafe_intrinsics method. When enabled, the Wasm being compiled is given access to special imports that correspond to direct, unchecked and unsandboxed, native load and store operations. These intrinsics are intended to be used for implementing fast, inline-able versions of WASI interfaces that are special-cased to a particular host embedding, for example.

Compile-time builtins, as originally described in the RFC, are basically made up of three parts:

A function inliner

Unsafe intrinsics

Component composition to encapsulate the usage of unsafe intrinsics in a safe interface

Part (1) has been implemented in Wasmtime and Cranelift for a little while now (see wasmtime::Config::compiler_inlining). This commit is part (2). After this commit lands, part (3) can be done with wac and wasm-compose, although follow up work is required to make the developer experience nicer and more integrated into Wasmtime so that the APIs can look like those proposed in the RFC.

I still have a little bit of doc comments and examples to fill out, but I thought it would be worth opening this PR up so that folks can start taking a look now, especially as I am taking Friday off and have a super-packed day tomorrow and probably won't have time to cross all the Ts and dot all the Is before next week.

One thing that no one brought up during the RFC but which started bugging me during this implementation is whether we can expose tools for compile-time builtin authors to do spectre mitigations. Basically expose an intrinsic that lowers to spectre_select_guard or something? Seems possible but I haven't explored the design space too much yet. Also seems like it is _probably_ something we can do in an additive fashion, without needing to figure everything out before landing any intrinsics. Interested in folks' thoughts!

Wasmtime GitHub notifications bot (Oct 08 2025 at 23:33):

fitzgen requested cfallin for a review on PR #11825.

Wasmtime GitHub notifications bot (Oct 08 2025 at 23:33):

fitzgen requested wasmtime-compiler-reviewers for a review on PR #11825.

Wasmtime GitHub notifications bot (Oct 08 2025 at 23:33):

fitzgen requested wasmtime-core-reviewers for a review on PR #11825.

Wasmtime GitHub notifications bot (Oct 08 2025 at 23:33):

fitzgen requested alexcrichton for a review on PR #11825.

Wasmtime GitHub notifications bot (Oct 09 2025 at 01:05):

github-actions[bot] commented on PR #11825:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "wasmtime:api", "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Oct 09 2025 at 07:24):

rvolosatovs created PR review comment:

What do you think about including uN-native-{add,sub}(u64, uN) -> uN, which would return the previous value and wrap around on overflows? (potentially in a follow-up)

Wasmtime GitHub notifications bot (Oct 09 2025 at 07:24):

rvolosatovs submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 15:24):

fitzgen created PR review comment:

Can you clarify what the intended semantics are and why additional intrinsics are necessary and regular Wasm arithmetic is insufficient? Is this for doing the underlying architecture's pointer-sized arithmetic?

Wasmtime GitHub notifications bot (Oct 09 2025 at 15:25):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 16:11):

rvolosatovs submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 16:11):

rvolosatovs created PR review comment:

I would imagine that e.g. a single u64-add would be more efficient, than load, followed by an add, followed by a store. Am I wrong assuming that's the case?
It would also be easier to use for embedders a little bit

Wasmtime GitHub notifications bot (Oct 09 2025 at 16:11):

rvolosatovs edited PR review comment.

Wasmtime GitHub notifications bot (Oct 09 2025 at 17:25):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 17:25):

fitzgen created PR review comment:

Cranelift can optimize load+add+store to fuse the operations into a single instruction on architectures like x86-64 where such an instruction is available:

https://github.com/bytecodealliance/wasmtime/blob/bacc2dc3e8fd88d99fb3df31cc19b8a6d0b600b8/cranelift/codegen/src/isa/x64/lower.isle#L3243-L3369

https://github.com/bytecodealliance/wasmtime/blob/bacc2dc3e8fd88d99fb3df31cc19b8a6d0b600b8/cranelift/filetests/filetests/isa/x64/load-op-store.clif#L4-L19

Wasmtime GitHub notifications bot (Oct 09 2025 at 17:32):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 09 2025 at 17:36):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 09 2025 at 18:16):

rvolosatovs submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 18:16):

rvolosatovs created PR review comment:

Oh, nice, then that solves that, yeah!

Wasmtime GitHub notifications bot (Oct 09 2025 at 18:45):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 09 2025 at 19:37):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

One thing that I feel has worked out well elsewhere is using macros to define the signature. That guarantees everything stays in sync. Would it be possible to avoid manually creating the function type here through parameters and procedurally derive the type from a signle source of truth?

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

This is a duplicate of TrampolineCompiler::abi_store_results, so I was wondering if it would be possible to make a TrampolineCompiler here in this function and use that method? Similarly for abi_load_params for replacing init above

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

I believe this'll need to truncate pointer on 32-bit platforms

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

Similar to loads, this'll want to truncate the pointer for 32-bit targets

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

If it's expected that the list of intrinsics is going to grow over time, should this perhaps be PrimaryMap<SomethingIndex, (ModuleInternedTypeIndex, UnsafeIntrinsic)>? Basically ignoring unused intrinsics.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

Mind updating the comment at the top of this file too?

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

I believe this should be implementable by plumbing to the underlying Cranelift compiler?

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

Just flagging the various TODO here to get resolved before merging

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

This feels to me like it should be "iterate over what the component needs and compile those" rather than iterating over all intrinsics?

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

Technically I don't believe this is correct, and also technically the tests in this PR violate this by having a different host function return the u64 "pointer" which gets read/mutated. I think this'll want to be reworded, or perhaps even dropped entirely? Whether or not a modification/read of memory is safe is more-or-less up to Miri in a sense so we could somewhat defer to that.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

Do you see a viable path to eventually omitting this? For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:39):

alexcrichton created PR review comment:

From a Rust soundness perspective this is not sound. One reason is that, for example:
fn main() {
    let mut data = Box::new(32);
    let ptr = &mut *data as *mut i32;

    // e.g. through wasm...
    unsafe {
        *ptr += 10;
    }

    // e.g. through a host call...
    *data += 11;

    // e.g. through wasm again ...
    unsafe {
        *ptr += 12;
    }
}
Running this through Miri shows that the third modification here (adding 12) is unsound. The reason here is that the original pointer is "invalidated" once the original data is used through a different location.

This is also technically not sound because it's mutating through a *const T pointer which was originally derived from &T which does not allow mutation. Basically the *mut () is going to need to be originally derived from *mut T.

One fix to this is that store.data() and data_mut go through this pointer rather than self.inner.data. Another possible fix is we do some pre/post logic around wasm entry/exit (some permutation, I don't know exactly what) where we do some provenance juggling along the lines of this which compiles to a noop but to the compiler has meaning.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:41):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:41):

alexcrichton created PR review comment:

Or, better yet, another possible fix is the "provenance juggling" approach modifying data and data_mut methods but in such a way that it compiles down to the same thing that happens today perhaps.

Wasmtime GitHub notifications bot (Oct 09 2025 at 21:41):

alexcrichton commented on PR #11825:

Oh, also, I'd recommend using prtest:full on this PR as this seems at high-risk of passing on x64 and failing elsewhere

Wasmtime GitHub notifications bot (Oct 10 2025 at 15:03):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Oct 10 2025 at 15:03):

alexcrichton created PR review comment:

For example, this is Miri-safe and additionally has the expected generated assembly

pub struct Foo {
    a: Box<i32>,
    raw_a: *mut i32,
}

#[unsafe(no_mangle)]
pub extern "C" fn mutate_raw(foo: &mut Foo) {
    unsafe {
        *foo.raw_a += 1;
    }
}

#[unsafe(no_mangle)]
pub extern "C" fn mutate_safe(foo: &mut Foo) {
    *get_a(foo) += 1;
}

#[unsafe(no_mangle)]
pub extern "C" fn get_a(foo: &mut Foo) -> &mut i32 {
    unsafe {
        let addr: *mut i32 = &raw mut *foo.a;
        &mut *foo.raw_a.with_addr(addr.addr())
    }
}

fn main() {
    let mut a = Foo {
        a: Box::new(200),
        raw_a: std::ptr::null_mut(),
    };
    a.raw_a = &mut *a.a as *mut i32;

    println!("first: raw");
    mutate_raw(&mut a);
    println!("second: safe");
    mutate_safe(&mut a);
    println!("third: raw");
    mutate_raw(&mut a);
}

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:14):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:14):

fitzgen created PR review comment:

Creating a whole TrampolineCompiler proved difficult due to it being fairly tied to component trampolines, but I did factor out the abi_{store_results,load_params} functions so that they are reusable from this code.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:33):

fitzgen created PR review comment:

There is no comment showing the pseudocode definition of VMStoreContext, just VMContext

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:33):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:34):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:34):

fitzgen created PR review comment:

Probably because we can define VMStoreContext as a regular Rust struct, because it doesn't have dynamically-sized array fields.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:39):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:39):

fitzgen created PR review comment:

My thinking has been that we can cross that bridge if we get to it.

Right now, there are very few intrinsics, and I'm not concerned about the size of little arrays like this. However, it definitely is intended that a component that doesn't use these intrinsics doesn't have them compiled into its text section and doesn't have any additional space reserved for their VMFuncRefs and whatnot in its vmctx layout.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:46):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:46):

fitzgen created PR review comment:

That would require a phase separation between regular function compilation and unsafe intrinsic compilation (so that we can determine which intrinsics are actually used via looking at CLIF external function imports after Wasm-to-CLIF translation but before inlining). This requires additional special-casing that we've been trying to remove from our compilation orchestration, so I'd rather not. I think, given how few intrinsics there currently are, that it is fine to get all of them if you expose them to a component.

In the future, I'd like to start doing gc-sections/DCE in our linking step, and rely on that to remove dead functions instead of introduce phases.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:56):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 20:56):

fitzgen created PR review comment:

Do you see a viable path to eventually omitting this?

When you say "this" what exactly are you referring to?

Already it should be the case that when unsafe intrinsics are not exposed to a component, this loop performs zero iterations and there should[^0] be zero space reserved for intrinsics' VMFuncRefs in the vmctx.

[^0]: I think there may be a bug where the space is reserved unconditionally right now, looking at CI. But that is definitely unintentional.

For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.

Yes, we would need to update the env_component.unsafe_intrinsics field after doing gc-sections/DCE during linking, same as we would need to do the moral equivalent for the VMFuncRefs of defined Wasm functions that are imported/exported within a component but are ultimately never called and are dead code.

Wasmtime GitHub notifications bot (Oct 14 2025 at 21:03):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 14 2025 at 22:32):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 14 2025 at 23:30):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 14 2025 at 23:33):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Oct 14 2025 at 23:33):

fitzgen created PR review comment:

Done in 8b893fe and then also added VmPtr in e39c493

Wasmtime GitHub notifications bot (Oct 14 2025 at 23:34):

fitzgen edited PR review comment.

Wasmtime GitHub notifications bot (Oct 16 2025 at 00:03):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 00:36):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 00:36):

fitzgen requested alexcrichton for a review on PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 00:37):

fitzgen commented on PR #11825:

@alexcrichton I think this should be ready for another review pass

Wasmtime GitHub notifications bot (Oct 16 2025 at 15:47):

alexcrichton created PR review comment:

For the CI failures I believe it's due to the fact that there's lingering access of the store data that doesn't go through these helpers. Could the data field be renamed to perhaps data_without_provenance or something like that with a comment to use these accessors?

Also, these unsafe blocks I think will definitely warrant a comment explaining what's going on as it's otherwise pretty nontrivial why they're setup the way they are

Wasmtime GitHub notifications bot (Oct 16 2025 at 15:47):

alexcrichton submitted PR review:

Thanks for slogging through all the CI bits and handling the miri bits, it's looking good!

To expand a bit on some of the unresolved comments from the previous review -- It feels a bit weird that there's different ways of managing the list of intrinsics for a component. The VMComponentContext either has 0 or all of them, compilation either compiles 0 or all of them, info::Component tracks a full list of intrinsics but has None for unneeded intrinsics, VMComponentContext initialization "nulls out" all intrinsics, and instantiation only fills in used intrinsics. To me this feels like a random mish-mash of different strategies to manage everything. I get your point about crossing the bridge when we get there, but I also feel like this PR is moving us to a state where it's pretty inconsistent how the intrinsics are handled. Some contexts are "all or nothing" and some contexts are "only used intrinsics".

Ideally I'd prefer a system where intrinsics were compacted/compiled on-demand as opposed to ever doing an "all or nothing" approach. My read of this is that this is basically a function of the initial analysis phase of a component and how clever it is. I would naively expect that fitting into the GlobalInitializer infrastructure would make the implementation "just fall out" by adding a new FooIndex type of some kind. Basically al lthe hash maps and helpers and such are all there, so I would naively expect the impementation to not be all that much work.

I'm perpetually worried about the quantity of work we defer given that the rate of burning down this deferred work is often much smaller than the rate of deferring, but this is a topic reasonable folks can disagree on. In that sense I'll lay out my concerns here, but I'll also leave it up to you whether to merge or not before addressing. If this merges as-is, though, mind opening an issue about these future improvements?

Wasmtime GitHub notifications bot (Oct 16 2025 at 15:47):

alexcrichton created PR review comment:

For "this" I mean this entire loop in the context when unsafe intrinsics are used. This loop exists for the vanishingly rare case that intrinsics are used but also turned into funcref values one way or another, but in practice they'll basically never get used.

In some sense I'm not saying much here, it's pretty clear that DCE won't make this loop go away, but an wasmtime-aware DCE pass which updated the unsafe_intrinsics list, would, however. So all I'm really saying here is that I think we should strive to make this loop go away in most situations when unsafe intrinsics are used, but that'll require fancy DCE.

Wasmtime GitHub notifications bot (Oct 16 2025 at 15:47):

alexcrichton created PR review comment:

To avoid the Option here, could this use NonNull::dangling() as an initial constructor? That'll still segfault if erroneously accessed but otherwise avoids the pesky unwraps

Wasmtime GitHub notifications bot (Oct 16 2025 at 17:10):

fitzgen commented on PR #11825:

I'm perpetually worried about the quantity of work we defer given that the rate of burning down this deferred work is often much smaller than the rate of deferring, but this is a topic reasonable folks can disagree on.

I hear you. I think it is important that we balance incrementalism and doing things the Right Way. Personally, I think that the lever we use to strike that balance is not by accepting that things will be in a fairly suboptimal state "temporarily" (which, as you note, is often not temporary) in service of shipping something a little sooner, but to instead drop functionality and features that we don't have time to implement well. This way, we might not have everything we ideally want, but what we do have is rock solid. So when we look at some work being left for "follow up PRs", we should ask whether what is landing now is rock solid, and the follow ups are "just" optimizations/features/functionality.[^0]

[^0]: This is all assuming we are talking about the implementation of new features. Obviously improving existing suboptimal code, even if there are still more follow ups to be done before it is fully optimal, is worth landing right away.

But of course all these generalizations are very high-level and vague. We can still come to reasonable disagreements on what is considered "rock solid" or which bits of functionality even can be left out while preserving what remains.

To make things in this PR a little more concrete when viewed through the above lens, I have been accepting the compromise that we will compile, link, and instantiate VMFuncRefs for all intrinsics when the flag to expose unsafe intrinsics is enabled. I think the Right Way to fix that, so that only the intrinsics that are actually used are compiled+linked+instantiated, is to do gc-sections/DCE during linking, but also that doing all that is something that can be delayed (perhaps for a very long time!) and things will be Fine in the meantime because what is implemented should be rock solid (albeit lacking the DCE optimizations we would have in an ideal world).

Ignoring for a second the implementation details in this PR around how intrinsics are tracked through translation and the like, do you agree that the above compromise is an acceptable cut to make?

Because I do agree somewhat with the following:

It feels a bit weird that there's different ways of managing the list of intrinsics for a component. The VMComponentContext either has 0 or all of them, compilation either compiles 0 or all of them, info::Component tracks a full list of intrinsics but has None for unneeded intrinsics, VMComponentContext initialization "nulls out" all intrinsics, and instantiation only fills in used intrinsics. To me this feels like a random mish-mash of different strategies to manage everything.

How the intrinsics are tracked through compilation and in the wasmtime-environ metadata structures is a little messy, and I'd like to improve it. But I don't want to expand the scope of those improvements to only compiling+linking+instantiating the intrinsics that are actually used. Doing that correctly (IMHO via DCE/gc-sections in linking) is too much for me to bite off immediately, and (unless I am missing something) the other ways of doing it require adding new special-cased code paths (which we have generally been trying to remove) and so would only bring us to local maxima, which doesn't seem worth spending the effort on.

And thanks as always for the thorough review! Will dig in some more momentarily.

Wasmtime GitHub notifications bot (Oct 16 2025 at 18:14):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 19:24):

alexcrichton closed without merge PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 19:24):

alexcrichton commented on PR #11825:

High-level definitely agree with everything you say, and I also want to strive for a balance of perfection and pragmatism. I'm happy to defer things to a follow-up PR to make progress effectively whenever, and the moment I pause is when the conclusion is to open an issue. Filing an issue means that it'll likely be years, if ever, before something changes. So in that sense "let's file an issue" is equated to me as "let's just ignore this" which is partly where I come from. I basically feel that there's no balance to filing an issue as it effectively means that someone else, who's likely to be less suited to the task, will have to take care of it.

For this PR specifically I agree the DCE/VMFuncRef bits should not happen here. There's definitely no need to entangle all that and it's a huge project for not a whole lot of benefit right now. What I was mostly referring to was the compilation of the unsafe intrinsics where it's a mish-mash of everything vs the subset used. That to me feels like the perfect candidate for if an issue is filed it'll just be forgotten and never addressed. It feels frequent that small improvements like this are rarely justified to spend time on which means they just never get time spent on them.

I agree with your rock-solid quality though, and to that end I have no concerns about this PR. I see no bugs in this PR and it's just a bit sub-optimal in a few areas. Given that I'm fine to see this merged. My personal preference would be to have a follow-up adjusting the compilation side of things to change the list-of-all-intrinsics to list-of-only-the-used-intrinsics, but I'm ok settling for an issue too.

Wasmtime GitHub notifications bot (Oct 16 2025 at 19:24):

alexcrichton reopened PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 19:24):

alexcrichton commented on PR #11825:

That was not the button I wanted...

Wasmtime GitHub notifications bot (Oct 16 2025 at 22:45):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 16 2025 at 22:47):

fitzgen commented on PR #11825:

@alexcrichton the latest commit (9ca326b) cleans this up a little bit, and also happens to give us demand-based compilation of intrinsics, since it turns out we know which ones were canon lowered by the time we are compiling functions. Mind taking another look? Does this help allieviate some of your concerns?

Wasmtime GitHub notifications bot (Oct 16 2025 at 23:26):

fitzgen updated PR #11825.

Wasmtime GitHub notifications bot (Oct 17 2025 at 00:00):

alexcrichton commented on PR #11825:

Agreed! I think everything at least now only works on O(intrinsics_used), and data-representation-wise I'd still prefer PrimaryMap<SomethingIndex, ...> but that's fine to defer to a future refactoring if necessary.

Wasmtime GitHub notifications bot (Oct 17 2025 at 00:25):

fitzgen merged PR #11825.

Last updated: Feb 24 2026 at 06:21 UTC