ggreif opened PR #13343 from ggreif:gabor/brif-cond-simplify to bytecodealliance:main:
Motivation
PR #13332 landed mid-end rules that fold
(eq/ne (ctz/clz X) 0)icmp shapes into direct bit tests onX. Those rules hinge on anicmpinterposed between the bit-counter and its consumer — i.e. the wasm 3-op patterni32.ctz; i32.eqz; br_if.Frontends that emit the 2-op form
i32.ctz; br_if(with noi32.eqzbetween them — e.g. Motoko'smoc, after itsand 1; eqz; br_if→ctz; br_ifbyte-size peephole) feed(brif (ctz X))into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.Approach
This PR extends
simplify_skeletonto rewrite the condition operand of an existingbrifin place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.Concretely:
- New
SkeletonInstSimplification::ReplaceBranchCond(Value)variant inprelude_opt.isle— a narrow rewrite that carries just the new cond value.- Driver patch in
cranelift/codegen/src/egraph/mod.rs: allowOpcode::Brifthrough the previously-blanketis_branch()skip; applyReplaceBranchCondby writing throughinst_args_mut. Other branches (jump, br_table, return, trap) still skip — their rewrites would change CFG.replace_branch_condconstructor inprelude_opt.isle.Two ISLE rules in
opts/icmp.isle:
isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))Effect
On the 2-op
brif (ctz X)/brif (clz X)patterns:
platform input mid-end-alone lowering x86_64 brif (ctz X)testl $1, %edi; je✓ (matches #13334's x64 backend rules)x86_64 brif (clz X)testl %edi, %edi; jge✓ (matches #13336's intent)aarch64 brif (ctz X)tbz w0, #0— single-instruction test-and-branch, tighter than #13336'stst+cmp+b.ccTest
New filetest
cranelift/filetests/filetests/egraph/brif-cnt-cond.clifcovers ctz/clz over i32/i64 in the 2-opbrif-direct form. All 70 cranelift egraph filetests pass.Supersedes
- Closes #13336 — this PR yields strictly better aarch64 code (
tbzvstst+cmp+b.cc).- Supersedes #13334 — will close once review discussion migrates here.
Future work
The
ReplaceBranchCondvariant only handlesbrif. Other side-effectful single-arg branches (trapnz,trapz) would be natural follow-ups for the same kind of cond simplification. A broader extension that allowsReplaceof abrifwith anotherbrif(validating same-successor invariant) is also possible but unnecessary for the cases this PR targets.
ggreif requested cfallin for a review on PR #13343.
ggreif requested wasmtime-compiler-reviewers for a review on PR #13343.
ggreif commented on PR #13343:
Cross-backend confirmation that this mid-end change is a strict win on every target wasmtime supports — no per-backend rules needed:
backend brif (ctz v0)brif (clz v0)x86_64 testl $1, %edi; jetestl %edi, %edi; jgeaarch64 tbz w0, #0(single op)cmp w0, #0; b.geriscv64 andi a0, a0, 1; sext.w a0, a0; beqz a0, …sext.w a0, a0; bgez a0, …s390x nilf %r2, 1; clfi %r2, 0; jgechi %r2, 0; jgheNotes:
- aarch64 ctz gets a single-instruction
tbz(test-bit-and-branch) — tighter than #13336'stst+cmp+b.ccform.- s390x (and the other two) fall in via the existing per-backend
icmp+briffusion path — the mid-end rewrite hands them a shape they already lower optimally, so no new backend rules anywhere.- The
sext.woverhead on riscv64 is a separate i32→i64 register-width peephole gap, independent of this PR; the bit-counter is still elided.So the cost-vs-benefit picture for landing this PR vs the two backend PRs (#13334 / #13336): one ~50-line mid-end change covers 4 ISAs simultaneously, with the aarch64 case getting strictly better code than dedicated backend rules would produce.
:thumbs_up: cfallin submitted PR review:
Looks fine -- thanks!
cfallin commented on PR #13343:
@ggreif it looks like you'll need to re-bless a test; and also run
cargo fmtto ensure all source is properly formatted. Happy to merge once that's done. Thanks!
ggreif updated PR #13343.
ggreif updated PR #13343.
cfallin commented on PR #13343:
(I saw your push to fix the formatting; the test-blessing failure is here and should be fixable with
WASMTIME_TEST_BLESS=1 cargo test --test disas)
ggreif requested pchickey for a review on PR #13343.
ggreif updated PR #13343.
ggreif requested wasmtime-core-reviewers for a review on PR #13343.
cfallin commented on PR #13343:
Ah, and now there's a merge conflict -- sorry for the merging troubles, @ggreif! If you could fix that, I'll be happy to merge.
ggreif updated PR #13343.
cfallin has enabled auto merge for PR #13343.
ggreif edited PR #13343:
Motivation
PR #13332 landed mid-end rules that fold
(eq/ne (ctz/clz X) 0)icmp shapes into direct bit tests onX. Those rules hinge on anicmpinterposed between the bit-counter and its consumer — i.e. the wasm 3-op patterni32.ctz; i32.eqz; br_if.Frontends that emit the 2-op form
i32.ctz; br_if(with noi32.eqzbetween them — e.g. Motoko'smoc, after itsand 1; eqz; br_if→ctz; br_ifbyte-size peephole) feed(brif (ctz X))into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.Approach
This PR extends
simplify_skeletonto rewrite the condition operand of an existingbrifin place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.Concretely:
- New
SkeletonInstSimplification::ReplaceBranchCond(Value)variant inprelude_opt.isle— a narrow rewrite that carries just the new cond value.- Driver patch in
cranelift/codegen/src/egraph/mod.rs: handle the new variant — in the cost-loop, accept it eagerly (no cost ranking against opcode-preserving rewrites); in the apply site, swap argument 0 in place viainst_args_mut. Composes with #13267's existing branch-simplification machinery; no guard relaxation needed.replace_branch_condconstructor inprelude_opt.isle.Two ISLE rules in
opts/icmp.isle:
isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))Effect
On the 2-op
brif (ctz X)/brif (clz X)patterns:
platform input mid-end-alone lowering x86_64 brif (ctz X)testl $1, %edi; je✓ (matches #13334's x64 backend rules)x86_64 brif (clz X)testl %edi, %edi; jge✓ (matches #13336's intent)aarch64 brif (ctz X)tbz w0, #0— single-instruction test-and-branch, tighter than #13336'stst+cmp+b.ccTest
New filetest
cranelift/filetests/filetests/egraph/brif-cnt-cond.clifcovers ctz/clz over i32/i64 in the 2-opbrif-direct form. All cranelift egraph filetests pass;tests/disas/ctz-clz-bool-condition.watre-blessed (the bare-form cases now collapse to the optimal 2-instruction shape).Supersedes
- Closes #13336 — this PR yields strictly better aarch64 code (
tbzvstst+cmp+b.cc).- Supersedes #13334 — closed once review discussion migrated here.
Future work
The
ReplaceBranchCondvariant covers the in-place cond swap; #13267 covers fullbrif-to-jumprewrites for constant conditions. A natural follow-up is extending the same cond-only rewrite shape totrapnz/trapzso e.g.(trapnz (ctz x) code)collapses to the bit-test form.
cfallin added PR #13343 cranelift: fold ctz/clz directly into brif cond via simplify_skeleton to the merge queue
ggreif commented on PR #13343:
Sidebar / future work for riscv64: per the cross-backend table above, the mid-end rewrite leaves a
sext.win the riscv64 lowering:brif (ctz v0): andi a0, a0, 1; sext.w a0, a0; beqz a0, ... brif (clz v0): sext.w a0, a0; bgez a0, ...For the ctz form the
sext.wis unconditionally redundant:andiwith a non-negative immediate (here1) zeroes the upper 32 bits, so the subsequentbeqz(which tests the full 64-bit register againstx0) reads the same value with or without the sext — the LSB-zero-ness is preserved either way. Soandi a0, a0, 1; beqz a0, ...is the optimal 2-op form.For the clz form
bgezdoes depend on the 64-bit sign bit; thesext.wis only redundant whenXis already known-canonical i32 in its register slot (e.g. via a known-extending producer likelw/sext.w/icmp result). Narrower peephole there.@alexcrichton — flagging since you're touching riscv64 currently; the patterns above are stable shapes the mid-end now emits unconditionally for any
brif (ctz x)/brif (clz x)consumer.
:check: cfallin merged PR #13343.
cfallin removed PR #13343 cranelift: fold ctz/clz directly into brif cond via simplify_skeleton from the merge queue
ggreif commented on PR #13343:
Follow-up: verified locally on
alexcrichton/wasmtime#riscv64-opts(PR #13350, commitfd259583a). It closes the ctz side of the sidebar above:
backend brif (ctz X)brif (clz X)riscv64 (before #13350) andi a0, a0, 1; sext.w a0, a0; beqz a0, …sext.w a0, a0; bgez a0, …riscv64 (with #13350) andi a0, a0, 1; bnez a0, …:check:sext.w a0, a0; bgez a0, …(unchanged)So riscv64 ctz now lands on the optimal 2-op shape across all bit-widths (i32 and i64 both elide the
sext.wpost-andi 1). The clz i32 case is unaffected — that's the narrower producer-tracking peephole (known-canonical i32 in its slot) which #13350 explicitly does not target.Closing this loop — thanks @alexcrichton.
Last updated: Jun 01 2026 at 09:49 UTC