cfallin opened issue #12477:
In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (
load.f64Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd/vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads forf32.copysignequally, but we'll describe the 64-bit case only here for brevity.)The chain of logic that leads to the load-sinking is:
The
fcopysignlowering takesValues and feeds them to the helper constructors forvandpdandvandnpd[here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here anf64) in wider machine registers and ignoring the upper bits.These constructors take
XmmMemarguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load
XmmMemcan be implicitly constructed from aValuebyput_in_xmm_mem, which checks for a sinkable load [here]A load is sinkable if [conditions] are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.
In other words, we use a
Valuecorresponding to anf64with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., anfaddlowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makesfcopysignspecial is that its lowering uses instructions that operate on the full 128 bits while dealing withf64s; e.g.,fadd.f64usesaddsd(add single double) which, if using a memory operand, will load only 64 bits.The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)
(rule (lower (has_type $F64 (fcopysign a b))) (simd_op (simd_op a) (simd_op b)))and only knowledge that (i)
aandbaref64s and (ii) thesimd_ops do 128 bits of work leads to seeing the bug.There are a few ways we could "break the chain" above and cause this mistake to be harder to make:
We could remove the implicit conversion from
ValuetoXmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.
With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the
ValuetoXmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.We could keep the implicit conversions but somehow carry the width of the original value dynamically in the
XmmMem(or more likely the underlyingAmode), and assert that the instruction consuming theAmodewill do a load of the expected width.This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the
Amode), and also relies on fuzzing rather than the type system to find errors.We could add another dimension to the newtype universe around registers, and have
XmmMem32,XmmMem64, andXmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions fromValueto each of these that dynamically fails if theValuehas a CLIF-level type with the wrong width.This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.
(Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on
(fcopysign (maybe_sink a) (maybe_sink b))wheremaybe_sinkgives us aValueMaybeSunkand we have an autoconverter fromValueMaybeSunktoXmmMem(and then fromValueonly toXmm, i.e. only the register form).This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.
I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that
vandpdhas a 128-bit operand width, and that does not match theValue; that something is either the author reading the Intel manual, deciding yes or no, and addingmaybe_sink, or types added in a principled way on both ends.cc @avanhatt @fitzgen for thoughts!
alexcrichton commented on issue #12477:
Reading over this I also quite like the
maybe_sinkidea. That feels like a lightweight-enough annotation to be easy enough to apply to what we have today while also making it much clearer when sinking is allowed or not. Right now we have a lot of(a Xmm a)which is a sufficient but non-obvious way to disallow sinking, and the omission ofmaybe_sinkmeaning "this can't be sunk" feels more clear.Inevitably we'll forget
maybe_sinkin places that it should be, but that seems like a better problem to have than the other way around.Also, to confirm, my assumption on that is that
Valuewould still auto-convert toXmmMem, it'd just always beXmmunder the hood. The production of(maybe_sink a)would then do the auto-conversion we have today toXmmMemwhere it binds eitherXmmorMemas appropriate.
cfallin edited issue #12477:
In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (
load.f64Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd/vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads forf32.copysignequally, but we'll describe the 64-bit case only here for brevity.)The chain of logic that leads to the load-sinking is:
The
fcopysignlowering takesValues and feeds them to the helper constructors forvandpdandvandnpd[here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here anf64) in wider machine registers and ignoring the upper bits.These constructors take
XmmMemarguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load
XmmMemcan be implicitly constructed from aValuebyput_in_xmm_mem, which checks for a sinkable load hereA load is sinkable if conditions are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.
In other words, we use a
Valuecorresponding to anf64with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., anfaddlowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makesfcopysignspecial is that its lowering uses instructions that operate on the full 128 bits while dealing withf64s; e.g.,fadd.f64usesaddsd(add single double) which, if using a memory operand, will load only 64 bits.The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)
(rule (lower (has_type $F64 (fcopysign a b))) (simd_op (simd_op a) (simd_op b)))and only knowledge that (i)
aandbaref64s and (ii) thesimd_ops do 128 bits of work leads to seeing the bug.There are a few ways we could "break the chain" above and cause this mistake to be harder to make:
We could remove the implicit conversion from
ValuetoXmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.
With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the
ValuetoXmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.We could keep the implicit conversions but somehow carry the width of the original value dynamically in the
XmmMem(or more likely the underlyingAmode), and assert that the instruction consuming theAmodewill do a load of the expected width.This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the
Amode), and also relies on fuzzing rather than the type system to find errors.We could add another dimension to the newtype universe around registers, and have
XmmMem32,XmmMem64, andXmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions fromValueto each of these that dynamically fails if theValuehas a CLIF-level type with the wrong width.This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.
(Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on
(fcopysign (maybe_sink a) (maybe_sink b))wheremaybe_sinkgives us aValueMaybeSunkand we have an autoconverter fromValueMaybeSunktoXmmMem(and then fromValueonly toXmm, i.e. only the register form).This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.
I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that
vandpdhas a 128-bit operand width, and that does not match theValue; that something is either the author reading the Intel manual, deciding yes or no, and addingmaybe_sink, or types added in a principled way on both ends.cc @avanhatt @fitzgen for thoughts!
cfallin commented on issue #12477:
Also, to confirm, my assumption on that is that
Valuewould still auto-convert toXmmMem, it'd just always beXmmunder the hood. The production of(maybe_sink a)would then do the auto-conversion we have today toXmmMemwhere it binds eitherXmmorMemas appropriate.Yes to the first -- the autoconversion would always force to a register.
(maybe_sink _)as an extractor would produce a newtype wrapper aroundValuelikeMaybeSinkableValue(Alexa: your original name wasValueMaybeSunkbut at this point it's not sunk yet); we'd have a(convert MaybeSinkableValue XmmMem)that does whatput_in_xmm_memdoes today.In other words, the check to see if something is sinkable happens in the constructor (right-hand side); we are only labeling it with a newtype on the left-hand side. That's important, because we don't know where the value is going to be used yet on the LHS, and we haven't committed to this rule yet when the LHS is evaluated. (I guess that's also why we have this distinction historically where we have a special mechanism to "look through" the sinkability boundary rather than using ordinary LHS matching: it requires a side-effect to actually commit to the sinking and make it real.)
It's important to note here too that this alone doesn't protect us against the CVE issue, which is a mismatch-in-width issue; that requires either the newtypes for 32/64/128-bit-wide operands or a field in
Amodeindicating the expected load width that we assert on when the amode meets a particular instruction in an RHS instruction constructor (e.g.vpandpdasserts it's 128 bits,mulsdasserts it's 64 bits,mulssasserts it's 32 bits).
cfallin edited a comment on issue #12477:
Also, to confirm, my assumption on that is that
Valuewould still auto-convert toXmmMem, it'd just always beXmmunder the hood. The production of(maybe_sink a)would then do the auto-conversion we have today toXmmMemwhere it binds eitherXmmorMemas appropriate.Yes to the first -- the autoconversion would always force to a register.
(maybe_sink _)as an extractor would produce a newtype wrapper aroundValuelikeMaybeSinkableValue(Alexa: your original name wasValueMaybeSunkbut at this point it's not sunk yet); we'd have a(convert MaybeSinkableValue XmmMem)that does whatput_in_xmm_memdoes today.In other words, the check to see if something is sinkable happens in the constructor (right-hand side); we are only labeling it with a newtype on the left-hand side. That's important, because we don't know where the value is going to be used yet on the LHS, and we haven't committed to this rule yet when the LHS is evaluated. (I guess that's also why we have this distinction historically where we have a special mechanism to "look through" the sinkability boundary rather than using ordinary LHS matching: it requires a side-effect to actually commit to the sinking and make it real.)
It's important to note here too that this alone (option 4) doesn't protect us against the CVE issue, which is a mismatch-in-width issue; that requires either the newtypes for 32/64/128-bit-wide operands (option 3) or a field in
Amodeindicating the expected load width (option 2) that we assert on when the amode meets a particular instruction in an RHS instruction constructor (e.g.vpandpdasserts it's 128 bits,mulsdasserts it's 64 bits,mulssasserts it's 32 bits).
avanhatt commented on issue #12477:
To add another reason why I like option 4 (possibly in addition to other changes):
From the perspective of verification, we'd really like it if the LHS can be syntactically sufficient to recover the semantics of a snippet of Clif.
E.g., without auto converters, if we wrote a version of this rule to explicitly match on a
loaded arg with the LHS of roughly:(lower (has_type $F64 (fcopysign (load a) b)))We know that is equivalent to:
v2 = load.f64 v0 v3 = fcopysign v2, v1Which has the semantics of: load a 64-bit value from memory, then perform the pure operator
fcopysign.The current status quo's LHS of:
(lower (has_type $F64 (fcopysign a b)))Is strange from this perspective: the type of
a(andb) isValue, which we'd like to understand as "a value in Clif, which may have come from any other sequence of Clif instructions but should reside in an SSA virtual register". From a semantic perspective, this LHS looks _pure_ in that it should not touch memory.With the new proposed LHS:
(lower (has_type $F64 (fcopysign (maybe_sunk a) b)))We can essentially split the semantics of the LHS into two possibilities: (1) that the first argument is indeed from a virtual register, with pure, non-memory-effecting semantics, or (2) that the first argument has equivalent semantics to a
load, which is memory-effecting.
On the point of whether to do option 3, I'm less inclined one way or the other at this point, but lean toward agreeing that it would be good.
We need to be able to provide semantics to
XmmMem, and again we will need to have cases: this type is either (1) a register with a width that (for verification) is determined during monomorphization before lowering to SMT, or (2) the result of a load with a specific width. Types likeXmmMem32would make getting this specification right a bit easier, but I can't say at this point whether there are cases where monomorphization would fail without this.To @cfallin's point: 4 alone does not prevent the CVE in terms of an ISLE metacompiler error, but it may have caused the author to not write the rule with
maybe_sunk, avoiding the CVE. That is, hope is that 4/maybe_sunkwould force a rule-author to at least consider the possibility that the LHS they are matching on includes a load, raising mental (if not type-enforced) alarms if they then have RHS ops with known wider width.
alexcrichton commented on issue #12477:
Personally I like the idea of sized XmmMem vales (and probably GprMem too). The raw instructions are already annotated with widths and so this would, in theory, largely be a matter of plumbing and type juggling.
Doing both of these (also maybe_sink), to me, tackles the two major angles for how this bug arose
avanhatt commented on issue #12477:
Just to be explicit: is our thinking that only the
*MemISLE types get widths?That is, should
Xmmalways be understood to have width 128 (even if in certain cases, we only care about the low bits)?I could see an argument that this is a nice policy:
- ISLE types defined in the shared prelude (
Reg,WritableReg,VecReg, etc) have polymorphic/dynamic widths.- Backend-specific ISLE types should have specific widths (if not explicit in the name, at least inferable from context, though explicit names would make the verification team's job easier).
But I'm not sure whether the second point is too strict of a distinction for x64.
cfallin commented on issue #12477:
If we take option 2, I think it's a little clearer actually if we carry through the refactor to the reg-only kinds as well; otherwise the boundary between the two gets weird. For example, if I have an
Xmm, can I autoconvert that to anXmmMem64too or onlyXmmMem128? (I guess if the width speaks only about the memory case, that's still viable, but it's subtle)So to be concrete, I'd imagine replacing
XmmwithXmm32/64/128,XmmMemwithXmmMem32/64/128, etc. We have autoconverters fromValueandMaybeSinkableValuefor all of them that check the type on theValueand panic if wrong. This then does mean we'll need "widening" operators too, I think, for cases likefcopysignwhere we use an instruction that really does do 128 bits of work on a narrower value (as opposed to the scalar ops likemulsdetc).Then
fcopysigncould either be written something like(rule (lower (has_type $F64 (fcopysign a b))) (simd_op (simd_op (widen_64_128 a)) (simd_op (widen_64_128 b))))where
abound in the LHS has typeValue, we have an autoconverter toXmm64for the operand ofwiden_64_128(which has signatureXmm64->Xmm128; notMem), we get anXmm128out, that autoconverts toXmmMem128(but only ever has theRegbranch), andsimd_op(e.g.vpandpd) is defined to take anXmmMem128.Or we could write it without those widen operators, and instead have an autoconverter for
ValuetoXmmMem128that, for a value of the wrong type/width, only ever yields the register case. Maybe that's actually even cleaner: we finally are feeding the appropriate information into the sinkability logic (what is the load width wanting to use this amode) so that, as you (Alexa) had initially wanted, we can inhibit the behavior at that level. I actually very much like that!
alexcrichton commented on issue #12477:
If we could get away with tagging all registers with their width I'd agree that'd be really nice. I suspect that'd be quite a large refactoring to the backends, but we've had a number of subtle bugs in the past related to mismatching register widths that this'd probably close off
fitzgen commented on issue #12477:
Regarding bit-width specific types, I want to reiterate my potential incremental migration plan from the discussion we had in/after the Cranelift meeting.
First, we provide the following infrastructure:
- Define
{Reg,Gpr,Xmm}[Mem[Imm]]{8,16,32,..}types that statically specify their bit width- Additionally, make the plain
{Reg,Gpr,Xmm}[Mem[Imm]]types without a static bit width dynamically track their bit width
- Optionally only when
cfg(debug_assertions), but I'd try pretty hard to keep it on all the time, if possible- Auto convert from
Reg/Gpr/Xmmto their static bit-width associated types (Reg32/Gpr16/Xmm64/etc...) with a dynamic assertion that the dynamic width matches the static type's width- Also auto convert the other way, from e.g.
Gpr64toGpr, where the static bit width becomes the dynamic bit width.Then we start making more and more use of the static types:
- Make the assembler's codegen emit ISLE constructors that take static bit-width operands and produce static bit-width results
- Auto convert from
Valueto dynamically typedReg/Gpr/Xmm, keeping track of the value's bit width as a dynamic parameter- Auto convert from
Valueto statically typedReg8/Gpr16/Xmm32/etc..., dynamically asserting that the value's dynamic bit width matches the result type's static width- Start annotating rules with static bit widths
This is effectively a combination of (2) and (3).
(And it is also orthogonal from whether we do (1) or (4) or continue to have auto conversions between
Valueand{Reg,Gpr,Xmm}Mem[Imm].)Final note regarding static types: we have a bunch of rules that work for many value widths, e.g. all values that fit in a single GPR. If we fully move to static types, we will need to "duplicate" these rules for each width (or introduce a generics/macro system...). So it may be desirable to have this hybrid dynamic+static approach not just as an incremental migration towards a fully-static system, but indefinitely.
Last updated: Feb 24 2026 at 04:36 UTC