Stream: git-wasmtime

Topic: wasmtime / PR #5854 riscv64: Improve `ctz`/`clz`/`cls` co...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 20:14):

afonso360 opened PR #5854 from riscv64-ctlz-improvments to main:

:wave: Hey,

This PR started by fixing the lowering rules for ctz. The base rule had (if-let $false (has_zbb)), but the rule for i8/i16 depended on both Zbb and Zbs. Which meant that if we only had Zbb we had no valid rule for lowering i8/i16. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.

I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.

We can now avoid requiring Zbs in all cases by falling back to an ori or or instruction instead to set a single bit.

It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 20:15):

afonso360 edited PR #5854 from riscv64-ctlz-improvments to main:

:wave: Hey,

This PR started by fixing the lowering rules for ctz. The base rule had (if-let $false (has_zbb)), but the rule for i8/i16 depended on both Zbb and Zbs. Which meant that if we only had Zbb we had no valid rule for lowering i8/i16. The fuzzer found this on https://github.com/bytecodealliance/wasmtime/pull/5844#issuecomment-1439854777.

I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.

We can now avoid requiring Zbs in all cases by falling back to an ori or or instruction instead to set a single bit.

It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.

It's been fuzzing for about an hour, I'll update this if it finds anything.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 12:51):

afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 12:52):

afonso360 has marked PR #5854 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

I want to note that the uses of fits_in_64 in the "Restrict lowering rules for ctz/clz" commit should be harmless but shouldn't be necessary. Since the rule for $I128 has higher priority, and since ctz/clz/cls accept only integer types, the only types which could reach the general rule are I8-I64.

It's definitely good to be consistent across rules for whether we're explicit about checking fits_in_64 or not. And it's probably good to be explicit since it serves as a kind of documentation. Currently, doing this causes ISLE to generate slightly worse code, but it's probably not measurably worse, and I hope to fix it eventually anyway.

So you don't need to do anything about this comment. I just wanted to mention that this change shouldn't change any lowering behavior.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

I'd kind of like to see bseti either fully rely on priorities or fully rely on pattern matches, but it's currently a mix of both.

Because u64_lteq is partial, the first and second rule need different priorities. If that constructor returned bool instead of Option<()>, then all three rules could fully express their preconditions and none of them would need priorities.

On the other hand, because the third rule has higher priority than the other two, you could remove (if-let $false (has_zbs)) from the lower-priority rules and rely on all the rules having priorities.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

The immediate here being 16 when I knew the constant needed to be 2^16 confused me so much. But of course 16 is 2^4, and lui shifts its immediate operand left by 12 bits, and 4+12 is 16, so this is correct.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

This comment should say "all bits are 0" for the clz lowering, right?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

Should tmp2 select the sign-extended tmp instead of the original r?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

I think this comment is a little misleading. Maybe: "We count leading zeros in a full 64-bit register, but only want the count within the low 8 or 16 bits, so subtract the number of extra bits."

Also I guess it doesn't make any difference but I suppose you could do this with clzw and subtract 32 instead.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

Possible future work: Maybe make MInst::Select accept an IntCC etc, so it can replace MInst::SelectReg, and so this lowering doesn't have to generate two separate branches for the same condition.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

I don't understand this TODO comment. As far as I understand we can't use not on the input without checking if it was negative. But isn't that exactly what the rule for lower_cls already does?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

I don't know what style to use for ISLE source most of the time, but I think there should at least be a space after (let.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 22:34):

jameysharp created PR review comment:

Wouldn't this generate better code if we checked whether (value_regs_get x 1) is zero, rather than checking if it has 64 leading zeros? We'd avoid loading a constant which should remove one instruction, right?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:12):

afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:47):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:47):

afonso360 created PR review comment:

Oops, I think I messed up slightly here. I looked at rust's output for u32::leading_ones, but that has slightly different semantics. LLVM's output for the equivalent operation is still better than our lowering, I'll update the comment with the godbolt link.

example::cls_u64:
        bgez    a0, .LBB4_2
        not     a0, a0
.LBB4_2:
        clz     a0, a0
        ret

Unfortunately it requires a branch, which sort of prevents us from easly implementing this in ISLE :disappointed: .

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:47):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:53):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 15:57):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 16:13):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 16:13):

afonso360 created PR review comment:

Yeah that's not a great explanation of that piece of code... I've updated it with your comment, which is much better, thanks!

Yeah, we should be able to use clzw, here I mostly kept it as it was before.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2023 at 16:14):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:21):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:21):

afonso360 created PR review comment:

That's a good find! I've also applied that to the ctz lowering, which is fairly similar.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:43):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:43):

afonso360 created PR review comment:

I have a slight preference for using pattern matching wherever possible, plus, it means we don't actually need u64_lteq we already have a u64_le which I didn't notice the first time around. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:45):

afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 14:45):

afonso360 has marked PR #5854 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 19:18):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 19:54):

afonso360 updated PR #5854 from riscv64-ctlz-improvments to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 19:55):

afonso360 has enabled auto merge for PR #5854.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2023 at 23:50):

afonso360 merged PR #5854.


Last updated: Dec 23 2024 at 12:05 UTC