ggreif edited PR #13334.
ggreif edited PR #13334:
Summary
Follow-up to #13332. That PR added egraph rules collapsing
(eq (ctz X) 0)/(ne (ctz X) 0)/(eq (clz X) 0)/(ne (clz X) 0)to direct LSB / sign-bit tests — but only when the comparison is mediated by an expliciticmp. The wasm front-end translateswasm if (ctz X)tobrif (ireduce.i32 (ctz.i64 X))directly (noicmp), so the egraph rules don't fire on the wasm-natural shape.This PR closes the gap by specialising
is_nonzeroin the x64 backend — the helper that allbrif/select/trapiflowerings funnel through.Rules
In
cranelift/codegen/src/isa/x64/inst.isle:(rule 3 (is_nonzero (ctz (ty_32_or_64 ty) val)) (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z))) (rule 3 (is_nonzero (ireduce _ (ctz (ty_32_or_64 ty) val))) (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z))) (rule 3 (is_nonzero (clz (ty_32_or_64 ty) val)) (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS)))) (rule 3 (is_nonzero (ireduce _ (clz (ty_32_or_64 ty) val))) (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))The
ireducevariant catches the wasm front-end'si32.wrap_i64over a 64-bitctz/clz— a no-op on values in [0, bitwidth].Test deltas (
tests/disas/ctz-clz-bool-condition.wat)
consumer before after if_ctz_bare_i325 insns ( bsfl + cmovel + test + jne)2 ( testl $1, %edx; je)if_ctz_bare_i645 insns ( bsfq + cmovq + test + jne)2 ( testq $1, %rdx; je)if_clz_bare_i327 insns ( bsr + cmov + sub + test + jne)2 ( testl + jns)The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test (
(ctz X) == 4) stays untouched.Motivation
Motoko's
moccodegen emitsi64.ctz X; i32.wrap_i64; iffor compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.A concrete idiomatic example: in Motoko, the
let-elsepattern overResultlet #ok payload = queryProp(...) else return defaultValue;desugars to a 2-arm refutable variant match (
#okvs#err). The variant-tag hashes arehash("ok") = 0x611C(LSB 0) andhash("err") = 0x4D0765(LSB 1) — they differ exactly at the LSB. The planned variant-switchBitTestdispatch (caffeinelabs/motoko'sgabor/variant-switch) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers toload hash; testq $1, ...; jccon x64 — three instructions for a pattern match. EveryResult-returning API + everylet-else-style early return collapses to this shape.Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.
Follow-ups (not in this PR)
- aarch64, riscv64, s390x analogues — separate PRs once x64 reviewer feedback lands.
select-consumer variant —selectalready routes throughis_nonzero_cmp→is_nonzero, so this PR's rules cover it too without extra work.
ggreif has marked PR #13334 as ready for review.
ggreif requested uweigand for a review on PR #13334.
ggreif requested wasmtime-compiler-reviewers for a review on PR #13334.
ggreif requested pchickey for a review on PR #13334.
ggreif requested wasmtime-core-reviewers for a review on PR #13334.
ggreif edited PR #13334.
github-actions[bot] added the label cranelift on PR #13334.
github-actions[bot] added the label cranelift:area:x64 on PR #13334.
alexcrichton unassigned uweigand from PR #13334 cranelift(x64): lower bare ctz/clz boolean tests via test+CC.
alexcrichton commented on PR #13334:
@cfallin or @fitzgen do y'all have any ideas about how to sort of deduplicate this with the optimization rules landed in https://github.com/bytecodealliance/wasmtime/pull/13332? It feels a bit unfortunate that we need basically the same rules twice, once for general expressions and once because
cond != 0is implicit in all conditional-y locations (brif,select,trapnz, etc). Without redefining how those instructions work I'm not sure how to avoid the duplication myself, but figured I'd ask if y'all had ideas. Another option might be to only have these rules in backends as opposed to also the mid-end, but that also doesn't feel great, the duplication probably isn't so bad.
cfallin commented on PR #13334:
Left a comment over here; putting these rules in the backend is definitely the wrong place IMHO, as aside from the software-engineering aspects (repetition) we want these simplifications to compose with other mid-end opts when possible.
I agree it'd be great to find a way to factor out simplifications for all "non-zero test" cases in the mid-end. I suppose we could define a helper
simplify_nonzeroand then invoke that with toplevelsimplify_skeletonrules onbrif,trapnz, etc...
ggreif commented on PR #13334:
Following on from @cfallin's
(brif (clz x))→(brif (band x (iconst 1)))sketch in #13336: I think the LSB extract pairs withctz, notclz. There are actually four cases — two per producer — and the bit-extract direction is determined by the producer:
pattern tests mid-end simplification (brif (eqz (ctz x)))LSB set (brif (band x 1))or(brif (shl x N-1))(brif (ctz x))LSB clear inverse of either (brif (eqz (clz x)))HSB set (brif (ushr x N-1))(brif (clz x))HSB clear (brif (eqz (ushr x N-1)))For the LSB case
band x 1andshl x (N-1)have the same wasm byte cost (small-const + op). For the HSB caseushr x (N-1)is strictly cheaper than the equivalentbandwith1 << (N-1): small consts like31/63are 1 byte LEB128, but0x80000000is 5 bytes — so prefer the shift form for HSB.I'll try the
simplify_skeleton-on-brifroute locally with these rules and report back. If it lets me drop the backend duplication in this PR (and #13336), I'll push; otherwise I'll close and rethink.
ggreif commented on PR #13334:
Update — the
simplify_skeletonextension does work, and it lets us retire both this PR and #13336.Approach (~50 lines total):
- New
SkeletonInstSimplification::ReplaceBranchCond(Value)variant — narrow rewrite that swaps argument 0 of an existingbrifwithout touching its opcode or successors. CFG-safe by construction.- Driver patch in
cranelift/codegen/src/egraph/mod.rs: allowOpcode::Brifthrough the previously-blanketis_branch()skip; applyReplaceBranchCondby writing throughinst_args_mut.prelude_opt.isleadds areplace_branch_condconstructor.- Two new rules in
opts/icmp.isle:
isle (rule (simplify_skeleton (brif (ctz x_ty X) _ _)) (replace_branch_cond (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0)))) (rule (simplify_skeleton (brif (clz x_ty X) _ _)) (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))Results on the 2-op
brif (ctz x)/brif (clz x)patterns this PR targets:
platform producer mid-end alone (no backend rules) x86_64 brif (ctz v0)testl $1, %edi; je✓ (matches what this PR's x64 rules produce)x86_64 brif (clz v0)testl %edi, %edi; jge✓ (matches #13336's intent)aarch64 brif (ctz v0)tbz w0, #0— single-instruction test-and-branch, tighter than #13336'stst+cmp+b.ccAll 70 cranelift egraph filetests pass. The
replace_branch_condmechanism composes with future similarbrif-cond simplifications (any value-level rewrite whose result is "0/non-0-equivalent to the original" qualifies).I'll push the mid-end branch as a new PR, with revert commits for this PR and #13336. Closing once that lands cleanly.
ggreif commented on PR #13334:
Closing in favor of #13343 — the mid-end
simplify_skeleton-on-brifextension subsumes this PR's x64 backend rules, the aarch64 rules in #13336, and covers riscv64 and s390x at the same time without any per-backend rule. Cross-backend comparison table is in the PR comment.
:cross_mark: ggreif closed without merge PR #13334.
Last updated: Jun 01 2026 at 09:49 UTC