Stream: git-wasmtime

Topic: wasmtime / PR #13343 cranelift: fold `ctz`/`clz` directly...


view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:38):

ggreif opened PR #13343 from ggreif:gabor/brif-cond-simplify to bytecodealliance:main:

Motivation

PR #13332 landed mid-end rules that fold (eq/ne (ctz/clz X) 0) icmp shapes into direct bit tests on X. Those rules hinge on an icmp interposed between the bit-counter and its consumer — i.e. the wasm 3-op pattern i32.ctz; i32.eqz; br_if.

Frontends that emit the 2-op form i32.ctz; br_if (with no i32.eqz between them — e.g. Motoko's moc, after its and 1; eqz; br_ifctz; br_if byte-size peephole) feed (brif (ctz X)) into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.

Approach

This PR extends simplify_skeleton to rewrite the condition operand of an existing brif in place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.

Concretely:

  1. New SkeletonInstSimplification::ReplaceBranchCond(Value) variant in prelude_opt.isle — a narrow rewrite that carries just the new cond value.
  2. Driver patch in cranelift/codegen/src/egraph/mod.rs: allow Opcode::Brif through the previously-blanket is_branch() skip; apply ReplaceBranchCond by writing through inst_args_mut. Other branches (jump, br_table, return, trap) still skip — their rewrites would change CFG.
  3. replace_branch_cond constructor in prelude_opt.isle.
  4. Two ISLE rules in opts/icmp.isle:

    isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))

Effect

On the 2-op brif (ctz X) / brif (clz X) patterns:

platform input mid-end-alone lowering
x86_64 brif (ctz X) testl $1, %edi; je ✓ (matches #13334's x64 backend rules)
x86_64 brif (clz X) testl %edi, %edi; jge ✓ (matches #13336's intent)
aarch64 brif (ctz X) tbz w0, #0single-instruction test-and-branch, tighter than #13336's tst+cmp+b.cc

Test

New filetest cranelift/filetests/filetests/egraph/brif-cnt-cond.clif covers ctz/clz over i32/i64 in the 2-op brif-direct form. All 70 cranelift egraph filetests pass.

Supersedes

Future work

The ReplaceBranchCond variant only handles brif. Other side-effectful single-arg branches (trapnz, trapz) would be natural follow-ups for the same kind of cond simplification. A broader extension that allows Replace of a brif with another brif (validating same-successor invariant) is also possible but unnecessary for the cases this PR targets.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:38):

ggreif requested cfallin for a review on PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:38):

ggreif requested wasmtime-compiler-reviewers for a review on PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:51):

ggreif commented on PR #13343:

Cross-backend confirmation that this mid-end change is a strict win on every target wasmtime supports — no per-backend rules needed:

backend brif (ctz v0) brif (clz v0)
x86_64 testl $1, %edi; je testl %edi, %edi; jge
aarch64 tbz w0, #0 (single op) cmp w0, #0; b.ge
riscv64 andi a0, a0, 1; sext.w a0, a0; beqz a0, … sext.w a0, a0; bgez a0, …
s390x nilf %r2, 1; clfi %r2, 0; jge chi %r2, 0; jghe

Notes:

So the cost-vs-benefit picture for landing this PR vs the two backend PRs (#13334 / #13336): one ~50-line mid-end change covers 4 ISAs simultaneously, with the aarch64 case getting strictly better code than dedicated backend rules would produce.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:54):

:thumbs_up: cfallin submitted PR review:

Looks fine -- thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 21:55):

cfallin commented on PR #13343:

@ggreif it looks like you'll need to re-bless a test; and also run cargo fmt to ensure all source is properly formatted. Happy to merge once that's done. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:00):

ggreif updated PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:01):

ggreif updated PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:12):

cfallin commented on PR #13343:

(I saw your push to fix the formatting; the test-blessing failure is here and should be fixable with WASMTIME_TEST_BLESS=1 cargo test --test disas)

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:14):

ggreif requested pchickey for a review on PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:14):

ggreif updated PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:14):

ggreif requested wasmtime-core-reviewers for a review on PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:15):

cfallin commented on PR #13343:

Ah, and now there's a merge conflict -- sorry for the merging troubles, @ggreif! If you could fix that, I'll be happy to merge.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:20):

ggreif updated PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:21):

cfallin has enabled auto merge for PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:23):

ggreif edited PR #13343:

Motivation

PR #13332 landed mid-end rules that fold (eq/ne (ctz/clz X) 0) icmp shapes into direct bit tests on X. Those rules hinge on an icmp interposed between the bit-counter and its consumer — i.e. the wasm 3-op pattern i32.ctz; i32.eqz; br_if.

Frontends that emit the 2-op form i32.ctz; br_if (with no i32.eqz between them — e.g. Motoko's moc, after its and 1; eqz; br_ifctz; br_if byte-size peephole) feed (brif (ctz X)) into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.

Approach

This PR extends simplify_skeleton to rewrite the condition operand of an existing brif in place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.

Concretely:

  1. New SkeletonInstSimplification::ReplaceBranchCond(Value) variant in prelude_opt.isle — a narrow rewrite that carries just the new cond value.
  2. Driver patch in cranelift/codegen/src/egraph/mod.rs: handle the new variant — in the cost-loop, accept it eagerly (no cost ranking against opcode-preserving rewrites); in the apply site, swap argument 0 in place via inst_args_mut. Composes with #13267's existing branch-simplification machinery; no guard relaxation needed.
  3. replace_branch_cond constructor in prelude_opt.isle.
  4. Two ISLE rules in opts/icmp.isle:

    isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))

Effect

On the 2-op brif (ctz X) / brif (clz X) patterns:

platform input mid-end-alone lowering
x86_64 brif (ctz X) testl $1, %edi; je ✓ (matches #13334's x64 backend rules)
x86_64 brif (clz X) testl %edi, %edi; jge ✓ (matches #13336's intent)
aarch64 brif (ctz X) tbz w0, #0single-instruction test-and-branch, tighter than #13336's tst+cmp+b.cc

Test

New filetest cranelift/filetests/filetests/egraph/brif-cnt-cond.clif covers ctz/clz over i32/i64 in the 2-op brif-direct form. All cranelift egraph filetests pass; tests/disas/ctz-clz-bool-condition.wat re-blessed (the bare-form cases now collapse to the optimal 2-instruction shape).

Supersedes

Future work

The ReplaceBranchCond variant covers the in-place cond swap; #13267 covers full brif-to-jump rewrites for constant conditions. A natural follow-up is extending the same cond-only rewrite shape to trapnz / trapz so e.g. (trapnz (ctz x) code) collapses to the bit-test form.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:35):

cfallin added PR #13343 cranelift: fold ctz/clz directly into brif cond via simplify_skeleton to the merge queue

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 22:43):

ggreif commented on PR #13343:

Sidebar / future work for riscv64: per the cross-backend table above, the mid-end rewrite leaves a sext.w in the riscv64 lowering:

brif (ctz v0):  andi a0, a0, 1; sext.w a0, a0; beqz a0, ...
brif (clz v0):  sext.w a0, a0; bgez a0, ...

For the ctz form the sext.w is unconditionally redundant: andi with a non-negative immediate (here 1) zeroes the upper 32 bits, so the subsequent beqz (which tests the full 64-bit register against x0) reads the same value with or without the sext — the LSB-zero-ness is preserved either way. So andi a0, a0, 1; beqz a0, ... is the optimal 2-op form.

For the clz form bgez does depend on the 64-bit sign bit; the sext.w is only redundant when X is already known-canonical i32 in its register slot (e.g. via a known-extending producer like lw/sext.w/icmp result). Narrower peephole there.

@alexcrichton — flagging since you're touching riscv64 currently; the patterns above are stable shapes the mid-end now emits unconditionally for any brif (ctz x) / brif (clz x) consumer.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 23:01):

:check: cfallin merged PR #13343.

view this post on Zulip Wasmtime GitHub notifications bot (May 12 2026 at 23:01):

cfallin removed PR #13343 cranelift: fold ctz/clz directly into brif cond via simplify_skeleton from the merge queue

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2026 at 08:25):

ggreif commented on PR #13343:

Follow-up: verified locally on alexcrichton/wasmtime#riscv64-opts (PR #13350, commit fd259583a). It closes the ctz side of the sidebar above:

backend brif (ctz X) brif (clz X)
riscv64 (before #13350) andi a0, a0, 1; sext.w a0, a0; beqz a0, … sext.w a0, a0; bgez a0, …
riscv64 (with #13350) andi a0, a0, 1; bnez a0, … :check: sext.w a0, a0; bgez a0, … (unchanged)

So riscv64 ctz now lands on the optimal 2-op shape across all bit-widths (i32 and i64 both elide the sext.w post-andi 1). The clz i32 case is unaffected — that's the narrower producer-tracking peephole (known-canonical i32 in its slot) which #13350 explicitly does not target.

Closing this loop — thanks @alexcrichton.


Last updated: Jun 01 2026 at 09:49 UTC