afonso360 opened PR #5854 from riscv64-ctlz-improvments
to main
:
:wave: Hey,
This PR started by fixing the lowering rules for
ctz
. The base rule had(if-let $false (has_zbb))
, but the rule fori8
/i16
depended on bothZbb
andZbs
. Which meant that if we only hadZbb
we had no valid rule for loweringi8
/i16
. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.
We can now avoid requiring
Zbs
in all cases by falling back to anori
oror
instruction instead to set a single bit.It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.
afonso360 edited PR #5854 from riscv64-ctlz-improvments
to main
:
:wave: Hey,
This PR started by fixing the lowering rules for
ctz
. The base rule had(if-let $false (has_zbb))
, but the rule fori8
/i16
depended on bothZbb
andZbs
. Which meant that if we only hadZbb
we had no valid rule for loweringi8
/i16
. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.
We can now avoid requiring
Zbs
in all cases by falling back to anori
oror
instruction instead to set a single bit.It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.
It's been fuzzing for about an hour, I'll update this if it finds anything.
afonso360 updated PR #5854 from riscv64-ctlz-improvments
to main
.
afonso360 has marked PR #5854 as ready for review.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I want to note that the uses of
fits_in_64
in the "Restrict lowering rules for ctz/clz" commit should be harmless but shouldn't be necessary. Since the rule for$I128
has higher priority, and since ctz/clz/cls accept only integer types, the only types which could reach the general rule are I8-I64.It's definitely good to be consistent across rules for whether we're explicit about checking
fits_in_64
or not. And it's probably good to be explicit since it serves as a kind of documentation. Currently, doing this causes ISLE to generate slightly worse code, but it's probably not measurably worse, and I hope to fix it eventually anyway.So you don't need to do anything about this comment. I just wanted to mention that this change shouldn't change any lowering behavior.
jameysharp created PR review comment:
I'd kind of like to see
bseti
either fully rely on priorities or fully rely on pattern matches, but it's currently a mix of both.Because
u64_lteq
is partial, the first and second rule need different priorities. If that constructor returnedbool
instead ofOption<()>
, then all three rules could fully express their preconditions and none of them would need priorities.On the other hand, because the third rule has higher priority than the other two, you could remove
(if-let $false (has_zbs))
from the lower-priority rules and rely on all the rules having priorities.
jameysharp created PR review comment:
The immediate here being 16 when I knew the constant needed to be 2^16 confused me so much. But of course 16 is 2^4, and
lui
shifts its immediate operand left by 12 bits, and 4+12 is 16, so this is correct.
jameysharp created PR review comment:
This comment should say "all bits are 0" for the
clz
lowering, right?
jameysharp created PR review comment:
Should
tmp2
select the sign-extendedtmp
instead of the originalr
?
jameysharp created PR review comment:
I think this comment is a little misleading. Maybe: "We count leading zeros in a full 64-bit register, but only want the count within the low 8 or 16 bits, so subtract the number of extra bits."
Also I guess it doesn't make any difference but I suppose you could do this with
clzw
and subtract 32 instead.
jameysharp created PR review comment:
Possible future work: Maybe make
MInst::Select
accept anIntCC
etc, so it can replaceMInst::SelectReg
, and so this lowering doesn't have to generate two separate branches for the same condition.
jameysharp created PR review comment:
I don't understand this TODO comment. As far as I understand we can't use
not
on the input without checking if it was negative. But isn't that exactly what the rule forlower_cls
already does?
jameysharp created PR review comment:
I don't know what style to use for ISLE source most of the time, but I think there should at least be a space after
(let
.
jameysharp created PR review comment:
Wouldn't this generate better code if we checked whether
(value_regs_get x 1)
is zero, rather than checking if it has 64 leading zeros? We'd avoid loading a constant which should remove one instruction, right?
afonso360 updated PR #5854 from riscv64-ctlz-improvments
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
Oops, I think I messed up slightly here. I looked at rust's output for
u32::leading_ones
, but that has slightly different semantics. LLVM's output for the equivalent operation is still better than our lowering, I'll update the comment with the godbolt link.example::cls_u64: bgez a0, .LBB4_2 not a0, a0 .LBB4_2: clz a0, a0 ret
Unfortunately it requires a branch, which sort of prevents us from easly implementing this in ISLE :disappointed: .
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah that's not a great explanation of that piece of code... I've updated it with your comment, which is much better, thanks!
Yeah, we should be able to use
clzw
, here I mostly kept it as it was before.
afonso360 edited PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
That's a good find! I've also applied that to the
ctz
lowering, which is fairly similar.
afonso360 submitted PR review.
afonso360 created PR review comment:
I have a slight preference for using pattern matching wherever possible, plus, it means we don't actually need
u64_lteq
we already have au64_le
which I didn't notice the first time around. Thanks!
afonso360 updated PR #5854 from riscv64-ctlz-improvments
to main
.
afonso360 has marked PR #5854 as ready for review.
jameysharp submitted PR review.
afonso360 updated PR #5854 from riscv64-ctlz-improvments
to main
.
afonso360 has enabled auto merge for PR #5854.
afonso360 merged PR #5854.
Last updated: Dec 23 2024 at 12:05 UTC