Stream: git-wasmtime

Topic: wasmtime / PR #13334 cranelift(x64): lower bare `ctz`/`cl...


view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 15:58):

ggreif edited PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:16):

ggreif edited PR #13334:

Summary

Follow-up to #13332. That PR added egraph rules collapsing (eq (ctz X) 0) / (ne (ctz X) 0) / (eq (clz X) 0) / (ne (clz X) 0) to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit icmp. The wasm front-end translates wasm if (ctz X) to brif (ireduce.i32 (ctz.i64 X)) directly (no icmp), so the egraph rules don't fire on the wasm-natural shape.

This PR closes the gap by specialising is_nonzero in the x64 backend — the helper that all brif/select/trapif lowerings funnel through.

Rules

In cranelift/codegen/src/isa/x64/inst.isle:

(rule 3 (is_nonzero (ctz (ty_32_or_64 ty) val))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (ireduce _ (ctz (ty_32_or_64 ty) val)))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (clz (ty_32_or_64 ty) val))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))
(rule 3 (is_nonzero (ireduce _ (clz (ty_32_or_64 ty) val)))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))

The ireduce variant catches the wasm front-end's i32.wrap_i64 over a 64-bit ctz/clz — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat)

consumer before after
if_ctz_bare_i32 5 insns (bsfl + cmovel + test + jne) 2 (testl $1, %edx; je)
if_ctz_bare_i64 5 insns (bsfq + cmovq + test + jne) 2 (testq $1, %rdx; je)
if_clz_bare_i32 7 insns (bsr + cmov + sub + test + jne) 2 (testl + jns)

The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test ((ctz X) == 4) stays untouched.

Motivation

Motoko's moc codegen emits i64.ctz X; i32.wrap_i64; if for compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.

A concrete idiomatic example: in Motoko, the let-else pattern over Result

let #ok payload = queryProp(...) else return defaultValue;

desugars to a 2-arm refutable variant match (#ok vs #err). The variant-tag hashes are hash("ok") = 0x611C (LSB 0) and hash("err") = 0x4D0765 (LSB 1) — they differ exactly at the LSB. The planned variant-switch BitTest dispatch (caffeinelabs/motoko's gabor/variant-switch) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers to load hash; testq $1, ...; jcc on x64 — three instructions for a pattern match. Every Result-returning API + every let-else-style early return collapses to this shape.

Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.

Follow-ups (not in this PR)

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:17):

ggreif has marked PR #13334 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:17):

ggreif requested uweigand for a review on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:17):

ggreif requested wasmtime-compiler-reviewers for a review on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:17):

ggreif requested pchickey for a review on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 16:17):

ggreif requested wasmtime-core-reviewers for a review on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 17:35):

ggreif edited PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 21:06):

github-actions[bot] added the label cranelift on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2026 at 21:06):

github-actions[bot] added the label cranelift:area:x64 on PR #13334.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 01:49):

alexcrichton unassigned uweigand from PR #13334 cranelift(x64): lower bare ctz/clz boolean tests via test+CC.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 01:52):

alexcrichton commented on PR #13334:

@cfallin or @fitzgen do y'all have any ideas about how to sort of deduplicate this with the optimization rules landed in https://github.com/bytecodealliance/wasmtime/pull/13332? It feels a bit unfortunate that we need basically the same rules twice, once for general expressions and once because cond != 0 is implicit in all conditional-y locations (brif, select, trapnz, etc). Without redefining how those instructions work I'm not sure how to avoid the duplication myself, but figured I'd ask if y'all had ideas. Another option might be to only have these rules in backends as opposed to also the mid-end, but that also doesn't feel great, the duplication probably isn't so bad.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 16:07):

cfallin commented on PR #13334:

Left a comment over here; putting these rules in the backend is definitely the wrong place IMHO, as aside from the software-engineering aspects (repetition) we want these simplifications to compose with other mid-end opts when possible.

I agree it'd be great to find a way to factor out simplifications for all "non-zero test" cases in the mid-end. I suppose we could define a helper simplify_nonzero and then invoke that with toplevel simplify_skeleton rules on brif, trapnz, etc...

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:12):

ggreif commented on PR #13334:

Following on from @cfallin's (brif (clz x))(brif (band x (iconst 1))) sketch in #13336: I think the LSB extract pairs with ctz, not clz. There are actually four cases — two per producer — and the bit-extract direction is determined by the producer:

pattern tests mid-end simplification
(brif (eqz (ctz x))) LSB set (brif (band x 1)) or (brif (shl x N-1))
(brif (ctz x)) LSB clear inverse of either
(brif (eqz (clz x))) HSB set (brif (ushr x N-1))
(brif (clz x)) HSB clear (brif (eqz (ushr x N-1)))

For the LSB case band x 1 and shl x (N-1) have the same wasm byte cost (small-const + op). For the HSB case ushr x (N-1) is strictly cheaper than the equivalent band with 1 << (N-1): small consts like 31/63 are 1 byte LEB128, but 0x80000000 is 5 bytes — so prefer the shift form for HSB.

I'll try the simplify_skeleton-on-brif route locally with these rules and report back. If it lets me drop the backend duplication in this PR (and #13336), I'll push; otherwise I'll close and rethink.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:33):

ggreif commented on PR #13334:

Update — the simplify_skeleton extension does work, and it lets us retire both this PR and #13336.

Approach (~50 lines total):

  1. New SkeletonInstSimplification::ReplaceBranchCond(Value) variant — narrow rewrite that swaps argument 0 of an existing brif without touching its opcode or successors. CFG-safe by construction.
  2. Driver patch in cranelift/codegen/src/egraph/mod.rs: allow Opcode::Brif through the previously-blanket is_branch() skip; apply ReplaceBranchCond by writing through inst_args_mut.
  3. prelude_opt.isle adds a replace_branch_cond constructor.
  4. Two new rules in opts/icmp.isle:
    isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))

Results on the 2-op brif (ctz x) / brif (clz x) patterns this PR targets:

platform producer mid-end alone (no backend rules)
x86_64 brif (ctz v0) testl $1, %edi; je ✓ (matches what this PR's x64 rules produce)
x86_64 brif (clz v0) testl %edi, %edi; jge ✓ (matches #13336's intent)
aarch64 brif (ctz v0) tbz w0, #0 — single-instruction test-and-branch, tighter than #13336's tst+cmp+b.cc

All 70 cranelift egraph filetests pass. The replace_branch_cond mechanism composes with future similar brif-cond simplifications (any value-level rewrite whose result is "0/non-0-equivalent to the original" qualifies).

I'll push the mid-end branch as a new PR, with revert commits for this PR and #13336. Closing once that lands cleanly.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:57):

ggreif commented on PR #13334:

Closing in favor of #13343 — the mid-end simplify_skeleton-on-brif extension subsumes this PR's x64 backend rules, the aarch64 rules in #13336, and covers riscv64 and s390x at the same time without any per-backend rule. Cross-backend comparison table is in the PR comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:57):

:cross_mark: ggreif closed without merge PR #13334.


Last updated: Jun 01 2026 at 09:49 UTC