While fixing up an EVEX PR, I noted that we currently don't handle the feature logic in a complete way. Currently we only have an OR operator, |, for concatenating features into a too-simple boolean term: 32-bit OR 64-bit OR .... @Alex Crichton brought up, correctly I believe, that everything still works out because we build up a list of all the features we expect to see, essentially treating these ORs as ANDs--so, for now, all is well (or at least overly restrictive in a safe way).
If in the future we want to use the new assembler for 32-bit compatibility mode, our current scheme breaks down: we need to write (32-bit OR 64-bit) AND .... for instructions that are valid in this mode and then actually check that the target features include 32-bit. We eventually will have the same problem with AVX10 instructions, since we'll see CPUs that have either AVX512 feature or AVX10 features (?) and we'll need to calculate a boolean term like: (32-bit OR 64-bit) AND ((AVX512VL AND AVX512F) OR AVX10.1).
I was planning to build out the functionality to evaluate these boolean terms but thought I would check here first: is there any reason _not_ to prefer a more complete implementation? The main one I can think of is performance: perhaps evaluating the boolean term is slower than the current approach of building up a Vec and running iter().all(matches_isa_flags) on it (?). I suspect we could mitigate this with some memoization, where we hash the feature set and store off a boolean for each unique feature set to avoid re-evaluating the term. But, then again, this might be overkill.
Any thoughts on this?
cc: @Alex Crichton, @Chris Fallin, @fitzgen (he/him)
Having a little boolean expression grammar seems reasonable to me!
I've been a little worried about the preexisting inefficiency of the current implementation, but it's never bubbled up enough that I've bothered to worry about it. In the current system it feels pretty inefficient codegen-wise to generate source that, for each instrution, builds a vec that is mostly a bitset. Then at runtime we, for each instruction, allocate this vector, transform it into a small vector, then iterate over the small vector, then check the bits.
I have no doubts that this could all be far more efficient though. For example we could probably remove available_in_any_isa entirely to operating exclusively on assembler instructions (as they're basically the only ones left) and then we could generate a function that takes the IsaFlags directly (sort of, probably via a trait since it's in another crate) and tests whether it's valid.
Effectively if we're gonna overhaul things I'd prefer to move towards something that's a bit leaner both for codegen and for runtime testing. How about something like we generate this:
trait CpuFeatures {
fn sse42(&self) -> bool;
// ...
}
impl Inst {
er gimme a sec I can't edit code on zulip
trait CpuFeatures {
fn sse42(&self) -> bool;
// ...
}
impl Inst {
fn is_available(&self, features: &impl CpuFeatures) -> bool {
match self {
// ...
}
}
}
impl addb_mi {
fn is_available(&self, _features: &impl CpuFeatures) -> bool {
true // no features here, or maybe fake _64b and compat?
}
}
impl vaddpd_mi {
fn is_available(&self, features: &impl CpuFeatures) -> bool {
features.avx()
}
}
impl vaddpd_evex_version {
fn is_available(&self, features: &impl CpuFeatures) -> bool {
features.avx512() || features.avx10()
}
}
then for the codegen side of things I think it's not the end of the world to continue to overload the | and & operators to build a symoblic expression, but I'm wary there too of doing too much because we have to codegen, when compiling the assembler itself, all the code to generate the code to be the assembler
in that a | b | c creates a lot of Vec<Flags> temoraries today with a lot of drop flags that rustc has to generate. The compile-time impact isn't huge from what I've tried to measure in the past but striving to generate less code is in theory always better
yeah, makes sense
but in any case nothing I'm thinking about is a dealbreaker one way or another and it's "just a refactor" away from any other solution
the main thing I'd like to see is to have the boolean tree reified in generated rust code for the assembler, not like an AST we forward and then parse in cranelift
er, traverse, not parse
yup, I was just going to say that
Andrew Brown said:
We eventually will have the same problem with AVX10 instructions, since we'll see CPUs that have either AVX512 feature or AVX10 features (?) and we'll need to calculate a boolean term like:
(32-bit OR 64-bit) AND ((AVX512VL AND AVX512F) OR AVX10.1).
Since AVX10 now always has 512-bit registers, I'd expect that all future CPUs with AVX10 also support the corresponding AVX512 features. That said, this wouldn't be the first time Intel or AMD decided to do something weird.
Last updated: Dec 06 2025 at 07:03 UTC