x64 flag-producing conventions · cranelift

I was replacing x64 instructions with ones emitted by the new assembler, and noticed a problem in how we lower instructions that rely on flags. Previously, we had decided to keep around rules to the effect of "if we add/sub an 8/16-bit register, just use the wider version to avoid CPU dependencies" (e.g., here). But this lowering rule is a problem when lowering instructions like sadd_overflow and the like, since the result won't overflow if we're using a wider register. What to do? It seems nice to keep the "use the wider version" rule, but it's unclear how to reuse all the lowering rules except that one when we're doing flag-related operations. Any opinions on what this should look like?

Andrew Brown (Apr 14 2025 at 22:29):

Alex Crichton (Apr 14 2025 at 22:37):

I figured the types would help here? E.g. x64_add produces a Gpr so it doesn't care about flags, but x64_add_with_flags_paired (and/or other variants) would return ProducesFlags which would use the precisely-sized thing and not use the x64_add helper

Alex Crichton (Apr 14 2025 at 22:37):

but also yeah I would imagine that add-with-flags is just forced to use the precisely-sized instruction, we've got no other option there

Andrew Brown (Apr 14 2025 at 22:38):

I think to avoid repeating ourselves, we want to reuse some of the same matching over the types.

Alex Crichton (Apr 14 2025 at 22:38):

but that's not possible with x64_add b/c that emits an instruciton, where the flags-based version wouldn't emit an instruction?

Andrew Brown (Apr 14 2025 at 22:39):

;; Helper for creating raw `add` instructions.
(decl x64_add_raw (Type Gpr GprMemImm) AssemblerOutputs)

;; Match 8-bit immediates first; allows a smaller instruction encoding.
(rule 2 (x64_add_raw $I32 src1 (is_simm8 src2))   (x64_addl_mi_sxb_raw src1 src2))
(rule 2 (x64_add_raw $I64 src1 (is_simm8 src2))   (x64_addq_mi_sxb_raw src1 src2))

;; Match the remaining immediates.
(rule 1 (x64_add_raw $I8  src1 (is_imm8 src2))    (x64_addb_mi_raw src1 src2))
(rule 1 (x64_add_raw $I16 src1 (is_imm16 src2))   (x64_addw_mi_raw src1 src2))
(rule 1 (x64_add_raw $I32 src1 (is_imm32 src2))   (x64_addl_mi_raw src1 src2))
(rule 1 (x64_add_raw $I64 src1 (is_simm32 src2))  (x64_addq_mi_sxl_raw src1 src2))

;; Match the operand size to the instruction width.
(rule 0 (x64_add_raw $I8  src1 (is_gpr_mem src2)) (x64_addb_rm_raw src1 src2))
(rule 0 (x64_add_raw $I16 src1 (is_gpr_mem src2)) (x64_addw_rm_raw src1 src2))
(rule 0 (x64_add_raw $I32 src1 (is_gpr_mem src2)) (x64_addl_rm_raw src1 src2))
(rule 0 (x64_add_raw $I64 src1 (is_gpr_mem src2)) (x64_addq_rm_raw src1 src2))

;; When the overflow flag is not considered, we can use wider instructions than
;; necessary for 8/16-bit register-to-register operations to avoid CPU false
;; dependencies.
(decl x64_add_break_deps (Type Gpr GprMemImm) AssemblerOutputs)
(rule 1 (x64_add_break_deps $I8  src1 (is_gpr src2))     (x64_addl_rm_raw src1 src2))
(rule 1 (x64_add_break_deps $I16 src1 (is_gpr src2))     (x64_addl_rm_raw src1 src2))
(rule 0 (x64_add_break_deps ty   src1 src2)              (x64_add_raw ty src1 src2))

;; Normal use of `add` returns a `Gpr` register.
(decl x64_add (Type Gpr GprMemImm) Gpr)
(rule (x64_add ty src1 src2)
      (emit_ret_gpr (x64_add_break_deps ty src1 src2)))

;; When using `add` for its overflow flag (OF), we track that the flags are
;; changed (and avoid the "dependency-breaking" rules that short-circuit
;; overflow).
(decl x64_add_with_flags_paired (Type Gpr GprMemImm) ProducesFlags)
(rule (x64_add_with_flags_paired ty src1 src2)
      (asm_produce_flags (x64_add_raw ty src1 src2)))

Alex Crichton (Apr 14 2025 at 22:41):

this is part of how I find the ProducesFlags bits a bit awkward, you can't really abstract over them and it has to be pushed to the leaves

Andrew Brown (Apr 14 2025 at 22:43):

It seems like the place to abstract over them is at the AssemblerOutputs level; saves some repeating. What do you think of x64_add_break_deps? It moves the offending rules out to a different matcher...

Alex Crichton (Apr 14 2025 at 22:45):

it doesn't seem unreasonable to me yeah, but personally I'd prefer to keep AssemblerOutputs out of ISLE in the sense that I thought we were going to try to have the ISLE interface be pure-ISLE-things and not mention assembler types

Alex Crichton (Apr 14 2025 at 22:46):

at some point I had a branch which generated ISLE constructors for ProducesFlags and such, I'm not sure if that landed or got lost though

Stream: cranelift

Topic: x64 flag-producing conventions

Andrew Brown (Apr 14 2025 at 22:29):

Andrew Brown (Apr 14 2025 at 22:29):

Alex Crichton (Apr 14 2025 at 22:37):

Alex Crichton (Apr 14 2025 at 22:37):

Andrew Brown (Apr 14 2025 at 22:38):

Alex Crichton (Apr 14 2025 at 22:38):

Andrew Brown (Apr 14 2025 at 22:39):

Andrew Brown (Apr 14 2025 at 22:39):

Alex Crichton (Apr 14 2025 at 22:41):

Alex Crichton (Apr 14 2025 at 22:41):

Andrew Brown (Apr 14 2025 at 22:43):

Alex Crichton (Apr 14 2025 at 22:45):

Alex Crichton (Apr 14 2025 at 22:46):

Andrew Brown (Apr 14 2025 at 22:46):

Alex Crichton (Apr 14 2025 at 22:47):

Andrew Brown (Apr 14 2025 at 22:47):

Andrew Brown (Apr 14 2025 at 23:37):