Stream: cranelift

Topic: x64 flag-producing conventions


view this post on Zulip Andrew Brown (Apr 14 2025 at 22:29):

I was replacing x64 instructions with ones emitted by the new assembler, and noticed a problem in how we lower instructions that rely on flags. Previously, we had decided to keep around rules to the effect of "if we add/sub an 8/16-bit register, just use the wider version to avoid CPU dependencies" (e.g., here). But this lowering rule is a problem when lowering instructions like sadd_overflow and the like, since the result won't overflow if we're using a wider register. What to do? It seems nice to keep the "use the wider version" rule, but it's unclear how to reuse all the lowering rules except that one when we're doing flag-related operations. Any opinions on what this should look like?

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:29):

cc: @Chris Fallin, @Alex Crichton, @fitzgen (he/him)

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:37):

I figured the types would help here? E.g. x64_add produces a Gpr so it doesn't care about flags, but x64_add_with_flags_paired (and/or other variants) would return ProducesFlags which would use the precisely-sized thing and not use the x64_add helper

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:37):

but also yeah I would imagine that add-with-flags is just forced to use the precisely-sized instruction, we've got no other option there

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:38):

I think to avoid repeating ourselves, we want to reuse some of the same matching over the types.

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:38):

but that's not possible with x64_add b/c that emits an instruciton, where the flags-based version wouldn't emit an instruction?

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:39):

I'm considering a refactoring that looks like:

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:39):

;; Helper for creating raw `add` instructions.
(decl x64_add_raw (Type Gpr GprMemImm) AssemblerOutputs)

;; Match 8-bit immediates first; allows a smaller instruction encoding.
(rule 2 (x64_add_raw $I32 src1 (is_simm8 src2))   (x64_addl_mi_sxb_raw src1 src2))
(rule 2 (x64_add_raw $I64 src1 (is_simm8 src2))   (x64_addq_mi_sxb_raw src1 src2))

;; Match the remaining immediates.
(rule 1 (x64_add_raw $I8  src1 (is_imm8 src2))    (x64_addb_mi_raw src1 src2))
(rule 1 (x64_add_raw $I16 src1 (is_imm16 src2))   (x64_addw_mi_raw src1 src2))
(rule 1 (x64_add_raw $I32 src1 (is_imm32 src2))   (x64_addl_mi_raw src1 src2))
(rule 1 (x64_add_raw $I64 src1 (is_simm32 src2))  (x64_addq_mi_sxl_raw src1 src2))

;; Match the operand size to the instruction width.
(rule 0 (x64_add_raw $I8  src1 (is_gpr_mem src2)) (x64_addb_rm_raw src1 src2))
(rule 0 (x64_add_raw $I16 src1 (is_gpr_mem src2)) (x64_addw_rm_raw src1 src2))
(rule 0 (x64_add_raw $I32 src1 (is_gpr_mem src2)) (x64_addl_rm_raw src1 src2))
(rule 0 (x64_add_raw $I64 src1 (is_gpr_mem src2)) (x64_addq_rm_raw src1 src2))

;; When the overflow flag is not considered, we can use wider instructions than
;; necessary for 8/16-bit register-to-register operations to avoid CPU false
;; dependencies.
(decl x64_add_break_deps (Type Gpr GprMemImm) AssemblerOutputs)
(rule 1 (x64_add_break_deps $I8  src1 (is_gpr src2))     (x64_addl_rm_raw src1 src2))
(rule 1 (x64_add_break_deps $I16 src1 (is_gpr src2))     (x64_addl_rm_raw src1 src2))
(rule 0 (x64_add_break_deps ty   src1 src2)              (x64_add_raw ty src1 src2))

;; Normal use of `add` returns a `Gpr` register.
(decl x64_add (Type Gpr GprMemImm) Gpr)
(rule (x64_add ty src1 src2)
      (emit_ret_gpr (x64_add_break_deps ty src1 src2)))

;; When using `add` for its overflow flag (OF), we track that the flags are
;; changed (and avoid the "dependency-breaking" rules that short-circuit
;; overflow).
(decl x64_add_with_flags_paired (Type Gpr GprMemImm) ProducesFlags)
(rule (x64_add_with_flags_paired ty src1 src2)
      (asm_produce_flags (x64_add_raw ty src1 src2)))

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:41):

I think we'll just have to avoid that sort of refactoring

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:41):

this is part of how I find the ProducesFlags bits a bit awkward, you can't really abstract over them and it has to be pushed to the leaves

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:43):

It seems like the place to abstract over them is at the AssemblerOutputs level; saves some repeating. What do you think of x64_add_break_deps? It moves the offending rules out to a different matcher...

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:45):

it doesn't seem unreasonable to me yeah, but personally I'd prefer to keep AssemblerOutputs out of ISLE in the sense that I thought we were going to try to have the ISLE interface be pure-ISLE-things and not mention assembler types

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:46):

at some point I had a branch which generated ISLE constructors for ProducesFlags and such, I'm not sure if that landed or got lost though

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:46):

Technically AssemblerOutputs is a cranelift-codegen type now...

view this post on Zulip Alex Crichton (Apr 14 2025 at 22:47):

oh ok, I think I'm lost enough that I'm probably not helping here...

view this post on Zulip Andrew Brown (Apr 14 2025 at 22:47):

ok, let me put the PR up and we can discuss there

view this post on Zulip Andrew Brown (Apr 14 2025 at 23:37):

https://github.com/bytecodealliance/wasmtime/pull/10580

This change replaces ISLE lowerings for the ProducesFlags and ConsumesFlags wrappers with instructions from the new assembler. This is a necessary step towards fully using the new assembler for ALU...

Last updated: Dec 06 2025 at 06:05 UTC