wasmtime / issue #12477 Cranelift: make load-sinking less... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #12477 Cranelift: make load-sinking less...

Wasmtime GitHub notifications bot (Jan 29 2026 at 22:59):

In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (load.f64 Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd / vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads for f32.copysign equally, but we'll describe the 64-bit case only here for brevity.)

The chain of logic that leads to the load-sinking is:

The fcopysign lowering takes Values and feeds them to the helper constructors for vandpd and vandnpd [here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here an f64) in wider machine registers and ignoring the upper bits.

These constructors take XmmMem arguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load

XmmMem can be implicitly constructed from a Value by put_in_xmm_mem, which checks for a sinkable load [here]

A load is sinkable if [conditions] are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.

In other words, we use a Value corresponding to an f64 with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., an fadd lowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makes fcopysign special is that its lowering uses instructions that operate on the full 128 bits while dealing with f64s; e.g., fadd.f64 uses addsd (add single double) which, if using a memory operand, will load only 64 bits.

The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)
(rule (lower (has_type $F64 (fcopysign a b)))
      (simd_op
        (simd_op a)
        (simd_op b)))
and only knowledge that (i) a and b are f64s and (ii) the simd_ops do 128 bits of work leads to seeing the bug.

There are a few ways we could "break the chain" above and cause this mistake to be harder to make:

We could remove the implicit conversion from Value to XmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.

This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.

With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the Value to XmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.

We could keep the implicit conversions but somehow carry the width of the original value dynamically in the XmmMem (or more likely the underlying Amode), and assert that the instruction consuming the Amode will do a load of the expected width.

This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the Amode), and also relies on fuzzing rather than the type system to find errors.

We could add another dimension to the newtype universe around registers, and have XmmMem32, XmmMem64, and XmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions from Value to each of these that dynamically fails if the Value has a CLIF-level type with the wrong width.

This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.

(Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on (fcopysign (maybe_sink a) (maybe_sink b)) where maybe_sink gives us a ValueMaybeSunk and we have an autoconverter from ValueMaybeSunk to XmmMem (and then from Value only to Xmm, i.e. only the register form).

This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.

I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that vandpd has a 128-bit operand width, and that does not match the Value; that something is either the author reading the Intel manual, deciding yes or no, and adding maybe_sink, or types added in a principled way on both ends.

cc @avanhatt @fitzgen for thoughts!

Wasmtime GitHub notifications bot (Jan 30 2026 at 14:23):

alexcrichton commented on issue #12477:

Reading over this I also quite like the maybe_sink idea. That feels like a lightweight-enough annotation to be easy enough to apply to what we have today while also making it much clearer when sinking is allowed or not. Right now we have a lot of (a Xmm a) which is a sufficient but non-obvious way to disallow sinking, and the omission of maybe_sink meaning "this can't be sunk" feels more clear.

Inevitably we'll forget maybe_sink in places that it should be, but that seems like a better problem to have than the other way around.

Also, to confirm, my assumption on that is that Value would still auto-convert to XmmMem, it'd just always be Xmm under the hood. The production of (maybe_sink a) would then do the auto-conversion we have today to XmmMem where it binds either Xmm or Mem as appropriate.

Wasmtime GitHub notifications bot (Jan 30 2026 at 16:39):

cfallin edited issue #12477:

In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (load.f64 Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd / vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads for f32.copysign equally, but we'll describe the 64-bit case only here for brevity.)

The chain of logic that leads to the load-sinking is:

The fcopysign lowering takes Values and feeds them to the helper constructors for vandpd and vandnpd [here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here an f64) in wider machine registers and ignoring the upper bits.

These constructors take XmmMem arguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load

XmmMem can be implicitly constructed from a Value by put_in_xmm_mem, which checks for a sinkable load here

A load is sinkable if conditions are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.

In other words, we use a Value corresponding to an f64 with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., an fadd lowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makes fcopysign special is that its lowering uses instructions that operate on the full 128 bits while dealing with f64s; e.g., fadd.f64 uses addsd (add single double) which, if using a memory operand, will load only 64 bits.

The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)
(rule (lower (has_type $F64 (fcopysign a b)))
      (simd_op
        (simd_op a)
        (simd_op b)))
and only knowledge that (i) a and b are f64s and (ii) the simd_ops do 128 bits of work leads to seeing the bug.

There are a few ways we could "break the chain" above and cause this mistake to be harder to make:

We could remove the implicit conversion from Value to XmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.

This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.

With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the Value to XmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.

We could keep the implicit conversions but somehow carry the width of the original value dynamically in the XmmMem (or more likely the underlying Amode), and assert that the instruction consuming the Amode will do a load of the expected width.

This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the Amode), and also relies on fuzzing rather than the type system to find errors.

We could add another dimension to the newtype universe around registers, and have XmmMem32, XmmMem64, and XmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions from Value to each of these that dynamically fails if the Value has a CLIF-level type with the wrong width.

This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.

(Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on (fcopysign (maybe_sink a) (maybe_sink b)) where maybe_sink gives us a ValueMaybeSunk and we have an autoconverter from ValueMaybeSunk to XmmMem (and then from Value only to Xmm, i.e. only the register form).

This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.

I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that vandpd has a 128-bit operand width, and that does not match the Value; that something is either the author reading the Intel manual, deciding yes or no, and adding maybe_sink, or types added in a principled way on both ends.

cc @avanhatt @fitzgen for thoughts!

Wasmtime GitHub notifications bot (Jan 30 2026 at 16:44):

cfallin commented on issue #12477:

Also, to confirm, my assumption on that is that Value would still auto-convert to XmmMem, it'd just always be Xmm under the hood. The production of (maybe_sink a) would then do the auto-conversion we have today to XmmMem where it binds either Xmm or Mem as appropriate.

Yes to the first -- the autoconversion would always force to a register. (maybe_sink _) as an extractor would produce a newtype wrapper around Value like MaybeSinkableValue (Alexa: your original name was ValueMaybeSunk but at this point it's not sunk yet); we'd have a (convert MaybeSinkableValue XmmMem) that does what put_in_xmm_mem does today.

In other words, the check to see if something is sinkable happens in the constructor (right-hand side); we are only labeling it with a newtype on the left-hand side. That's important, because we don't know where the value is going to be used yet on the LHS, and we haven't committed to this rule yet when the LHS is evaluated. (I guess that's also why we have this distinction historically where we have a special mechanism to "look through" the sinkability boundary rather than using ordinary LHS matching: it requires a side-effect to actually commit to the sinking and make it real.)

It's important to note here too that this alone doesn't protect us against the CVE issue, which is a mismatch-in-width issue; that requires either the newtypes for 32/64/128-bit-wide operands or a field in Amode indicating the expected load width that we assert on when the amode meets a particular instruction in an RHS instruction constructor (e.g. vpandpd asserts it's 128 bits, mulsd asserts it's 64 bits, mulss asserts it's 32 bits).

Wasmtime GitHub notifications bot (Jan 30 2026 at 16:45):

cfallin edited a comment on issue #12477:

Also, to confirm, my assumption on that is that Value would still auto-convert to XmmMem, it'd just always be Xmm under the hood. The production of (maybe_sink a) would then do the auto-conversion we have today to XmmMem where it binds either Xmm or Mem as appropriate.

Yes to the first -- the autoconversion would always force to a register. (maybe_sink _) as an extractor would produce a newtype wrapper around Value like MaybeSinkableValue (Alexa: your original name was ValueMaybeSunk but at this point it's not sunk yet); we'd have a (convert MaybeSinkableValue XmmMem) that does what put_in_xmm_mem does today.

In other words, the check to see if something is sinkable happens in the constructor (right-hand side); we are only labeling it with a newtype on the left-hand side. That's important, because we don't know where the value is going to be used yet on the LHS, and we haven't committed to this rule yet when the LHS is evaluated. (I guess that's also why we have this distinction historically where we have a special mechanism to "look through" the sinkability boundary rather than using ordinary LHS matching: it requires a side-effect to actually commit to the sinking and make it real.)

It's important to note here too that this alone (option 4) doesn't protect us against the CVE issue, which is a mismatch-in-width issue; that requires either the newtypes for 32/64/128-bit-wide operands (option 3) or a field in Amode indicating the expected load width (option 2) that we assert on when the amode meets a particular instruction in an RHS instruction constructor (e.g. vpandpd asserts it's 128 bits, mulsd asserts it's 64 bits, mulss asserts it's 32 bits).

Wasmtime GitHub notifications bot (Jan 30 2026 at 17:15):

avanhatt commented on issue #12477:

To add another reason why I like option 4 (possibly in addition to other changes):

From the perspective of verification, we'd really like it if the LHS can be syntactically sufficient to recover the semantics of a snippet of Clif.

E.g., without auto converters, if we wrote a version of this rule to explicitly match on a loaded arg with the LHS of roughly:
(lower (has_type $F64 (fcopysign (load a) b)))
We know that is equivalent to:
v2 = load.f64 v0
v3 = fcopysign v2, v1
Which has the semantics of: load a 64-bit value from memory, then perform the pure operator fcopysign.

The current status quo's LHS of:
(lower (has_type $F64 (fcopysign a b)))
Is strange from this perspective: the type of a (and b) is Value, which we'd like to understand as "a value in Clif, which may have come from any other sequence of Clif instructions but should reside in an SSA virtual register". From a semantic perspective, this LHS looks _pure_ in that it should not touch memory.

With the new proposed LHS:
(lower (has_type $F64 (fcopysign (maybe_sunk a) b)))
We can essentially split the semantics of the LHS into two possibilities: (1) that the first argument is indeed from a virtual register, with pure, non-memory-effecting semantics, or (2) that the first argument has equivalent semantics to a load, which is memory-effecting.

On the point of whether to do option 3, I'm less inclined one way or the other at this point, but lean toward agreeing that it would be good.

We need to be able to provide semantics to XmmMem, and again we will need to have cases: this type is either (1) a register with a width that (for verification) is determined during monomorphization before lowering to SMT, or (2) the result of a load with a specific width. Types like XmmMem32 would make getting this specification right a bit easier, but I can't say at this point whether there are cases where monomorphization would fail without this.

To @cfallin's point: 4 alone does not prevent the CVE in terms of an ISLE metacompiler error, but it may have caused the author to not write the rule with maybe_sunk, avoiding the CVE. That is, hope is that 4/maybe_sunk would force a rule-author to at least consider the possibility that the LHS they are matching on includes a load, raising mental (if not type-enforced) alarms if they then have RHS ops with known wider width.

Wasmtime GitHub notifications bot (Jan 30 2026 at 18:26):

alexcrichton commented on issue #12477:

Personally I like the idea of sized XmmMem vales (and probably GprMem too). The raw instructions are already annotated with widths and so this would, in theory, largely be a matter of plumbing and type juggling.

Doing both of these (also maybe_sink), to me, tackles the two major angles for how this bug arose

Wasmtime GitHub notifications bot (Jan 30 2026 at 18:46):

avanhatt commented on issue #12477:

Just to be explicit: is our thinking that only the *Mem ISLE types get widths?

That is, should Xmm always be understood to have width 128 (even if in certain cases, we only care about the low bits)?

I could see an argument that this is a nice policy:

ISLE types defined in the shared prelude (Reg, WritableReg, VecReg, etc) have polymorphic/dynamic widths.

Backend-specific ISLE types should have specific widths (if not explicit in the name, at least inferable from context, though explicit names would make the verification team's job easier).

But I'm not sure whether the second point is too strict of a distinction for x64.

Wasmtime GitHub notifications bot (Jan 30 2026 at 19:14):

cfallin commented on issue #12477:

If we take option 2, I think it's a little clearer actually if we carry through the refactor to the reg-only kinds as well; otherwise the boundary between the two gets weird. For example, if I have an Xmm, can I autoconvert that to an XmmMem64 too or only XmmMem128? (I guess if the width speaks only about the memory case, that's still viable, but it's subtle)

So to be concrete, I'd imagine replacing Xmm with Xmm32/64/128, XmmMem with XmmMem32/64/128, etc. We have autoconverters from Value and MaybeSinkableValue for all of them that check the type on the Value and panic if wrong. This then does mean we'll need "widening" operators too, I think, for cases like fcopysign where we use an instruction that really does do 128 bits of work on a narrower value (as opposed to the scalar ops like mulsd etc).

Then fcopysign could either be written something like
(rule (lower (has_type $F64 (fcopysign a b)))
      (simd_op
        (simd_op (widen_64_128 a))
        (simd_op (widen_64_128 b))))
where a bound in the LHS has type Value, we have an autoconverter to Xmm64 for the operand of widen_64_128 (which has signature Xmm64 -> Xmm128; not Mem), we get an Xmm128 out, that autoconverts to XmmMem128 (but only ever has the Reg branch), and simd_op (e.g. vpandpd) is defined to take an XmmMem128.

Or we could write it without those widen operators, and instead have an autoconverter for Value to XmmMem128 that, for a value of the wrong type/width, only ever yields the register case. Maybe that's actually even cleaner: we finally are feeding the appropriate information into the sinkability logic (what is the load width wanting to use this amode) so that, as you (Alexa) had initially wanted, we can inhibit the behavior at that level. I actually very much like that!

Wasmtime GitHub notifications bot (Feb 02 2026 at 04:07):

alexcrichton commented on issue #12477:

If we could get away with tagging all registers with their width I'd agree that'd be really nice. I suspect that'd be quite a large refactoring to the backends, but we've had a number of subtle bugs in the past related to mismatching register widths that this'd probably close off

Wasmtime GitHub notifications bot (Feb 02 2026 at 17:14):

fitzgen commented on issue #12477:

Regarding bit-width specific types, I want to reiterate my potential incremental migration plan from the discussion we had in/after the Cranelift meeting.

First, we provide the following infrastructure:

Define {Reg,Gpr,Xmm}[Mem[Imm]]{8,16,32,..} types that statically specify their bit width

Additionally, make the plain {Reg,Gpr,Xmm}[Mem[Imm]] types without a static bit width dynamically track their bit width

Optionally only when cfg(debug_assertions), but I'd try pretty hard to keep it on all the time, if possible

Auto convert from Reg/Gpr/Xmm to their static bit-width associated types (Reg32/Gpr16/Xmm64/etc...) with a dynamic assertion that the dynamic width matches the static type's width

Also auto convert the other way, from e.g. Gpr64 to Gpr, where the static bit width becomes the dynamic bit width.

Then we start making more and more use of the static types:

Make the assembler's codegen emit ISLE constructors that take static bit-width operands and produce static bit-width results

Auto convert from Value to dynamically typed Reg/Gpr/Xmm, keeping track of the value's bit width as a dynamic parameter

Auto convert from Value to statically typed Reg8/Gpr16/Xmm32/etc..., dynamically asserting that the value's dynamic bit width matches the result type's static width

Start annotating rules with static bit widths

This is effectively a combination of (2) and (3).

(And it is also orthogonal from whether we do (1) or (4) or continue to have auto conversions between Value and {Reg,Gpr,Xmm}Mem[Imm].)

Final note regarding static types: we have a bunch of rules that work for many value widths, e.g. all values that fit in a single GPR. If we fully move to static types, we will need to "duplicate" these rules for each width (or introduce a generics/macro system...). So it may be desirable to have this hybrid dynamic+static approach not just as an incremental migration towards a fully-static system, but indefinitely.

Last updated: Feb 24 2026 at 04:36 UTC