afonso360 opened PR #5854 from riscv64-ctlz-improvments to main:
:wave: Hey,
This PR started by fixing the lowering rules for
ctz. The base rule had(if-let $false (has_zbb)), but the rule fori8/i16depended on bothZbbandZbs. Which meant that if we only hadZbbwe had no valid rule for loweringi8/i16. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.
We can now avoid requiring
Zbsin all cases by falling back to anoriororinstruction instead to set a single bit.It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.
afonso360 edited PR #5854 from riscv64-ctlz-improvments to main:
:wave: Hey,
This PR started by fixing the lowering rules for
ctz. The base rule had(if-let $false (has_zbb)), but the rule fori8/i16depended on bothZbbandZbs. Which meant that if we only hadZbbwe had no valid rule for loweringi8/i16. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.
We can now avoid requiring
Zbsin all cases by falling back to anoriororinstruction instead to set a single bit.It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.
It's been fuzzing for about an hour, I'll update this if it finds anything.
afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.
afonso360 has marked PR #5854 as ready for review.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I want to note that the uses of
fits_in_64in the "Restrict lowering rules for ctz/clz" commit should be harmless but shouldn't be necessary. Since the rule for$I128has higher priority, and since ctz/clz/cls accept only integer types, the only types which could reach the general rule are I8-I64.It's definitely good to be consistent across rules for whether we're explicit about checking
fits_in_64or not. And it's probably good to be explicit since it serves as a kind of documentation. Currently, doing this causes ISLE to generate slightly worse code, but it's probably not measurably worse, and I hope to fix it eventually anyway.So you don't need to do anything about this comment. I just wanted to mention that this change shouldn't change any lowering behavior.
jameysharp created PR review comment:
I'd kind of like to see
bsetieither fully rely on priorities or fully rely on pattern matches, but it's currently a mix of both.Because
u64_lteqis partial, the first and second rule need different priorities. If that constructor returnedboolinstead ofOption<()>, then all three rules could fully express their preconditions and none of them would need priorities.On the other hand, because the third rule has higher priority than the other two, you could remove
(if-let $false (has_zbs))from the lower-priority rules and rely on all the rules having priorities.
jameysharp created PR review comment:
The immediate here being 16 when I knew the constant needed to be 2^16 confused me so much. But of course 16 is 2^4, and
luishifts its immediate operand left by 12 bits, and 4+12 is 16, so this is correct.
jameysharp created PR review comment:
This comment should say "all bits are 0" for the
clzlowering, right?
jameysharp created PR review comment:
Should
tmp2select the sign-extendedtmpinstead of the originalr?
jameysharp created PR review comment:
I think this comment is a little misleading. Maybe: "We count leading zeros in a full 64-bit register, but only want the count within the low 8 or 16 bits, so subtract the number of extra bits."
Also I guess it doesn't make any difference but I suppose you could do this with
clzwand subtract 32 instead.
jameysharp created PR review comment:
Possible future work: Maybe make
MInst::Selectaccept anIntCCetc, so it can replaceMInst::SelectReg, and so this lowering doesn't have to generate two separate branches for the same condition.
jameysharp created PR review comment:
I don't understand this TODO comment. As far as I understand we can't use
noton the input without checking if it was negative. But isn't that exactly what the rule forlower_clsalready does?
jameysharp created PR review comment:
I don't know what style to use for ISLE source most of the time, but I think there should at least be a space after
(let.
jameysharp created PR review comment:
Wouldn't this generate better code if we checked whether
(value_regs_get x 1)is zero, rather than checking if it has 64 leading zeros? We'd avoid loading a constant which should remove one instruction, right?
afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.
afonso360 submitted PR review.
afonso360 created PR review comment:
Oops, I think I messed up slightly here. I looked at rust's output for
u32::leading_ones, but that has slightly different semantics. LLVM's output for the equivalent operation is still better than our lowering, I'll update the comment with the godbolt link.example::cls_u64: bgez a0, .LBB4_2 not a0, a0 .LBB4_2: clz a0, a0 retUnfortunately it requires a branch, which sort of prevents us from easly implementing this in ISLE :disappointed: .
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah that's not a great explanation of that piece of code... I've updated it with your comment, which is much better, thanks!
Yeah, we should be able to use
clzw, here I mostly kept it as it was before.
afonso360 edited PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
That's a good find! I've also applied that to the
ctzlowering, which is fairly similar.
afonso360 submitted PR review.
afonso360 created PR review comment:
I have a slight preference for using pattern matching wherever possible, plus, it means we don't actually need
u64_lteqwe already have au64_lewhich I didn't notice the first time around. Thanks!
afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.
afonso360 has marked PR #5854 as ready for review.
jameysharp submitted PR review.
afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.
afonso360 has enabled auto merge for PR #5854.
afonso360 merged PR #5854.
Last updated: Dec 06 2025 at 06:05 UTC