boolean terms for x64 features · cranelift

While fixing up an EVEX PR, I noted that we currently don't handle the feature logic in a complete way. Currently we only have an OR operator, |, for concatenating features into a too-simple boolean term: 32-bit OR 64-bit OR .... @Alex Crichton brought up, correctly I believe, that everything still works out because we build up a list of all the features we expect to see, essentially treating these ORs as ANDs--so, for now, all is well (or at least overly restrictive in a safe way).

If in the future we want to use the new assembler for 32-bit compatibility mode, our current scheme breaks down: we need to write (32-bit OR 64-bit) AND .... for instructions that are valid in this mode and then actually check that the target features include 32-bit. We eventually will have the same problem with AVX10 instructions, since we'll see CPUs that have either AVX512 feature or AVX10 features (?) and we'll need to calculate a boolean term like: (32-bit OR 64-bit) AND ((AVX512VL AND AVX512F) OR AVX10.1).

I was planning to build out the functionality to evaluate these boolean terms but thought I would check here first: is there any reason _not_ to prefer a more complete implementation? The main one I can think of is performance: perhaps evaluating the boolean term is slower than the current approach of building up a Vec and running iter().all(matches_isa_flags) on it (?). I suspect we could mitigate this with some memoization, where we hash the feature set and store off a boolean for each unique feature set to avoid re-evaluating the term. But, then again, this might be overkill.

x64: EVEX Encoding for new assembler by rahulchaphalkar · Pull Request #11153 · bytecodealliance/wasmtime

Adds evex encoding for new assembler. Adds vaddpd evex instr. Following items are not implemented in this PR / left out intentionally - Broadcast attribute (bcst). Will be implemented in future pr...

Andrew Brown (Jul 16 2025 at 23:24):

Chris Fallin (Jul 16 2025 at 23:28):

Alex Crichton (Jul 17 2025 at 00:01):

I've been a little worried about the preexisting inefficiency of the current implementation, but it's never bubbled up enough that I've bothered to worry about it. In the current system it feels pretty inefficient codegen-wise to generate source that, for each instrution, builds a vec that is mostly a bitset. Then at runtime we, for each instruction, allocate this vector, transform it into a small vector, then iterate over the small vector, then check the bits.

I have no doubts that this could all be far more efficient though. For example we could probably remove available_in_any_isa entirely to operating exclusively on assembler instructions (as they're basically the only ones left) and then we could generate a function that takes the IsaFlags directly (sort of, probably via a trait since it's in another crate) and tests whether it's valid.

Effectively if we're gonna overhaul things I'd prefer to move towards something that's a bit leaner both for codegen and for runtime testing. How about something like we generate this:

trait CpuFeatures {
    fn sse42(&self) -> bool;
    // ...
}
impl Inst {

Alex Crichton (Jul 17 2025 at 00:02):

Alex Crichton (Jul 17 2025 at 00:04):

trait CpuFeatures {
    fn sse42(&self) -> bool;
    // ...
}

impl Inst {
    fn is_available(&self, features: &impl CpuFeatures) -> bool {
        match self {
            // ...
        }
    }
}

impl addb_mi {
    fn is_available(&self, _features: &impl CpuFeatures) -> bool {
        true // no features here, or maybe fake _64b and compat?
    }
}

impl vaddpd_mi {
    fn is_available(&self, features: &impl CpuFeatures) -> bool {
        features.avx()
    }
}

impl vaddpd_evex_version {
    fn is_available(&self, features: &impl CpuFeatures) -> bool {
        features.avx512() || features.avx10()
    }
}

Alex Crichton (Jul 17 2025 at 00:05):

then for the codegen side of things I think it's not the end of the world to continue to overload the | and & operators to build a symoblic expression, but I'm wary there too of doing too much because we have to codegen, when compiling the assembler itself, all the code to generate the code to be the assembler

Alex Crichton (Jul 17 2025 at 00:06):

in that a | b | c creates a lot of Vec<Flags> temoraries today with a lot of drop flags that rustc has to generate. The compile-time impact isn't huge from what I've tried to measure in the past but striving to generate less code is in theory always better

Andrew Brown (Jul 17 2025 at 00:06):

Alex Crichton (Jul 17 2025 at 00:06):

but in any case nothing I'm thinking about is a dealbreaker one way or another and it's "just a refactor" away from any other solution

Alex Crichton (Jul 17 2025 at 00:07):

the main thing I'd like to see is to have the boolean tree reified in generated rust code for the assembler, not like an AST we forward and then parse in cranelift

Alex Crichton (Jul 17 2025 at 00:07):

Andrew Brown (Jul 17 2025 at 00:07):

Jacob Lifshay (Jul 18 2025 at 09:00):

Since AVX10 now always has 512-bit registers, I'd expect that all future CPUs with AVX10 also support the corresponding AVX512 features. That said, this wouldn't be the first time Intel or AMD decided to do something weird.

Stream: cranelift

Topic: boolean terms for x64 features

Andrew Brown (Jul 16 2025 at 23:23):

Andrew Brown (Jul 16 2025 at 23:24):

Chris Fallin (Jul 16 2025 at 23:28):

Alex Crichton (Jul 17 2025 at 00:01):

Alex Crichton (Jul 17 2025 at 00:02):

Alex Crichton (Jul 17 2025 at 00:04):

Alex Crichton (Jul 17 2025 at 00:05):

Alex Crichton (Jul 17 2025 at 00:06):

Andrew Brown (Jul 17 2025 at 00:06):

Alex Crichton (Jul 17 2025 at 00:06):

Alex Crichton (Jul 17 2025 at 00:07):

Alex Crichton (Jul 17 2025 at 00:07):

Andrew Brown (Jul 17 2025 at 00:07):

Jacob Lifshay (Jul 18 2025 at 09:00):