Stream: git-wasmtime

Topic: wasmtime / PR #13334 cranelift(x64): lower bare ctz/clz b...


view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 15:57):

ggreif opened PR #13334 from ggreif:gabor/ctz-clz-brif-lowering to bytecodealliance:main:

Summary

Follow-up to #13332. That PR added egraph rules collapsing (eq (ctz X) 0) / (ne (ctz X) 0) / (eq (clz X) 0) / (ne (clz X) 0) to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit icmp. The wasm front-end translates wasm if (ctz X) to brif (ireduce.i32 (ctz.i64 X)) directly (no icmp), so the egraph rules don't fire on the wasm-natural shape.

This PR closes the gap by specialising is_nonzero in the x64 backend — the helper that all brif/select/trapif lowerings funnel through.

Rules

In cranelift/codegen/src/isa/x64/inst.isle:

(rule 3 (is_nonzero (ctz (ty_32_or_64 ty) val))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (ireduce _ (ctz (ty_32_or_64 ty) val)))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (clz (ty_32_or_64 ty) val))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))
(rule 3 (is_nonzero (ireduce _ (clz (ty_32_or_64 ty) val)))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))

The ireduce variant catches the wasm front-end's i32.wrap_i64 over a 64-bit ctz/clz — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat)

consumer before after
if_ctz_bare_i32 5 insns (bsfl + cmovel + test + jne) 2 (testl \$1, %edx; je)
if_ctz_bare_i64 5 insns (bsfq + cmovq + test + jne) 2 (testq \$1, %rdx; je)
if_clz_bare_i32 7 insns (bsr + cmov + sub + test + jne) 2 (testl + jns)

The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test ((ctz X) == 4) stays untouched.

Motivation

Motoko's moc codegen emits i64.ctz X; i32.wrap_i64; if for compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.

A concrete idiomatic example: in Motoko, the let-else pattern over Result

let #ok payload = queryProp(...) else return defaultValue;

desugars to a 2-arm refutable variant match (#ok vs #err). The variant-tag hashes are \`hash(\"ok\") = 0x611C\` (LSB 0) and \`hash(\"err\") = 0x4D0765\` (LSB 1) — they differ exactly at the LSB. The planned variant-switch \`BitTest\` dispatch (caffeinelabs/motoko's \`gabor/variant-switch\`) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers to \`load hash; testq \$1, ...; jcc\` on x64 — three instructions for a pattern match. Every \`Result\`-returning API + every \`let-else\`-style early return collapses to this shape.

Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.

Follow-ups (not in this PR)


Last updated: Jun 01 2026 at 09:49 UTC