abrown opened issue #12157:
Feature
Add support for APX instructions in Cranelift's x64 backend. For a description of APX and links to the specification, see this [white paper].
[white paper]: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
Benefit
If APX were available in Cranelift it would provide:
- more registers: for legacy instructions (not VEX or EVEX), we could now use the REX2 prefix to access 32 registers instead of 16; Cranelift certainly could benefit from more registers to allocate, reducing register pressure (e.g., faster regalloc at compile time, fewer spills and reloads at runtime).
- three operands: switching legacy instructions to use APX's EVEX changes not only gives access to all 32 registers (like REX2), but also allows instructions to use three operands (i.e., new data destination, NDD); Cranelift already expects three operands to instructions and currently hacks the x64 backend to pretend, so this is also a desirable change.
- backwards compatibility: in Cranelift, we would perform a CPUID check and only emit APX instructions when the target allows them; Cranelift would continue to emit the current set of instructions (i.e., REX, VEX, and EVEX) for older targets. This approach would allow mixing APX and legacy instructions.
Implementation
Much work has already been accomplished to support this. Cranelift has a new assembler,
cranelift-assembler-x64, that can emit the EVEX encodings APX will need; it does not yet have logic for the REX2 encodings but this can fit in beside the other encodings (REX, VEX, EVEX).regalloc2has a new operand constraint,Limit, that will allow us to control how many registers each instruction can allocate to (https://github.com/bytecodealliance/regalloc2/pull/239). And the assembler can communicate this information, viaInst::num_registers_avaliable, up to Cranelift (https://github.com/bytecodealliance/wasmtime/pull/11714). What remains is to wire these things together (no trivial task!).Here is how I would do it:
- [ ] enable fuzzing of
Limitconstraints inregalloc2(see my [fuzz-limit-constraints-rebased] branch)- [ ] conditionally extend the number of registers available in Cranelift (see my [
apx-extend-registers] branch)- [ ] allow existing AVX512 instructions to use 32 registers; with
Limit, allow EVEX-encoded instructions to allocate to all available registers — this is a good intermediate proof point- [ ] design a way to link x64 legacy instructions to their APX equivalent; this could reuse the existing
.alt(...)API and something similar to the ISLE...or_avxhelpers- [ ] add some initial subset of APX instructions
- [ ] enable testing in CI (@rahulchaphalkar has a PoC using an emulator)
- [ ] add remaining APX instructions
[
fuzz-limit-constraints-rebased]: https://github.com/abrown/regalloc2/tree/fuzz-limit-constraints-rebased
[apx-extend-registers]: https://github.com/abrown/wasmtime/tree/apx-extend-registersWhat I described above assumes the use of EVEX to encode instructions since this allows the use of three operands via the NDD flag. This encoding, however, could increase code size, so there is a decision to be made whether to use the REX2 prefix (a) instead or (b) alongside those instructions. The REX2 prefix would allow allocating to 32 registers but not three operands. If we chose to go down approach (b), it would complicate the instruction selection helper somewhat: do we use the legacy form? the REX2 form? the EVEX form? I would propose measuring code size between some pair of steps above to make this decision with actual data before finishing the last few steps.
abrown commented on issue #12157:
cc: @rahulchaphalkar, @jlb6740
Last updated: Dec 13 2025 at 19:03 UTC