afonso360 opened PR #6087 from riscv-zba
to main
:
:wave: Hey,
This PR adds the instructions present in the Zba extension to the RISC-V 64 backend. The Zba extension contains instructions for address generation.
Here's a quick summary:
| Mnemonic | Description |
| ------------ | ------------ |
|add.uw rd, rs1, rs2
| Add unsigned word |
|sh1add rd, rs1, rs2
| Shift left by 1 and add |
|sh1add.uw rd, rs1, rs2
| Shift unsigned word left by 1 and add |
|sh2add rd, rs1, rs2
| Shift left by 2 and add |
|sh2add.uw rd, rs1, rs2
| Shift unsigned word left by 2 and add |
|sh3add rd, rs1, rs2
| Shift left by 3 and add |
|sh3add.uw rd, rs1, rs2
| Shift unsigned word left by 3 and add |
|slli.uw rd, rs1, imm
| Shift-left unsigned word (Immediate) |Besides directly matching the above, we also add the
zext.w
mnemnoic which is essentiallyadd.uw rd, rs, zero
.I've left this fuzzing overnight on a riscv64 machine and all seems ok so far.
afonso360 updated PR #6087 from riscv-zba
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
A
uextend
can't have a result type of the same width as its operand, so(ty_32_or_64 ty)
can be replaced with$I64
in these rules.
jameysharp created PR review comment:
I believe this can be at priority 2 without overlapping with the
zext.h
rule above, since this rule only extends from I32 and that rule only extends from I16.Then I think you can undo all the priority changes in the remaining
extend
rules below.That said, I like the idea of putting both the I128 rules at the same priority, so it's clear that the only difference between them is the
ExtendOp
. (That may improve the ISLE compiler's output by making it clear that the other pattern matches can all be shared.) So I'd be in favor of still bumping the sign-extend-to-128 case up to priority 3.
jameysharp created PR review comment:
Nice catch, noticing that this rule could never fire because the
fits_in_64
case would always match first. I had extended the overlap checker to catch that sort of bug, but since it doesn't know how extractors likefits_in_64
andfits_in_32
are related, it couldn't find this particular issue.That said, is there any benefit to treating 32-bit
iadd
differently than 64-bitiadd
? The upper bits shouldn't matter on either input or output.I see that
addw
sign-extends the 32-bit result to 64-bits, so using it would make sense in a rule matching(has_type $I64 (sextend (has_type $I32 (iadd x y))))
.
jameysharp created PR review comment:
This is another case where you can replace
(fits_in_64 ty)
with$I64
, because there are no other types which satisfy both that pattern and the constraints onuextend
from$I32
.Possible future work: remove all uses of
maybe_uextend
fromishl
lowering rules, and add egraph rules to remove anyuextend
/sextend
/ireduce
from the second operand toishl
/ushr
/sshr
. Since Cranelift's shifts all accept any integer type as the second operand, no matter what the controlling type is, and because shift amounts are always masked to at most 7 bits, it's always safe to remove both widening and narrowing conversions.
jameysharp created PR review comment:
ISLE will generate slightly better Rust if you use the infallible
u64_from_imm64
extractor instead of the fallibleuimm8
extractor. In both cases the patterns will match the same inputs, but withuimm8
it'll test that the input is in the range 0..256 before checking against the specific constant values.
jameysharp created PR review comment:
I think that, like
add.uw
, theshnadd.uw
instructions will only give the correct result if we're zero-extending from exactly$I32
to exactly$I64
.I'd also request a VERY LOUD comment here :sweat_smile: that these instructions can only implement
(ishl (uextend _) _)
, _not_(uextend (ishl _ _))
. Getting that wrong on x86 was basically the cause of our last CVE. I believe you've gotten them right here, if I read the instruction-set manual correctly, but I was worried for a moment.
jameysharp created PR review comment:
Let's add more runtests which are like
sh1add_uext
but with theuextend
on the result of theishl
instead of on its first operand. These additional tests should verify that whenv1
has its most significant N bits set, those bits aren't added into the upper half ofv0
.
jameysharp created PR review comment:
What would you think of this replacement, and similarly in
sh3add_uext
doing the same but with 8 instead of 4? It's slightly easier for me to do the math in my head that way.; run: %sh2add_uext(4, 0xFFFFFFFF) == 0x400000000
I'm also thinking something like these tests would be good:
; run: %sh1add_uext(0x100000000, 0x80000000) == 0x200000000 ; run: %sh2add_uext(0x100000000, 0x80000000) == 0x300000000 ; run: %sh3add_uext(0x100000000, 0x80000000) == 0x500000000
afonso360 submitted PR review.
afonso360 created PR review comment:
That said, is there any benefit to treating 32-bit iadd differently than 64-bit iadd? The upper bits shouldn't matter on either input or output.
Not really other than perhaps it being a hint that we only care about the 32bits for whoever is reading the ASM? Not sure that warrants a different rule.
I see that addw sign-extends the 32-bit result to 64-bits, so using it would make sense in a rule matching (has_type $I64 (sextend (has_type $I32 (iadd x y)))).
Yeah that's a good suggestion, I'll add that!
afonso360 edited PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
That seems correct, and I think since we don't force
uextend
here to be i32, it's wrong for i8/i16.I also double checked against LLVM, and they also manually sign extend i16/i8 before using
shnadd
.
afonso360 submitted PR review.
afonso360 created PR review comment:
This is another case where you can replace (fits_in_64 ty) with $I64, because there are no other types which satisfy both that pattern and the constraints on uextend from $I32.
:+1:
Possible future work: remove all uses of maybe_uextend from ishl lowering rules, and add egraph rules to remove any uextend/sextend/ireduce from the second operand to ishl/ushr/sshr. Since Cranelift's shifts all accept any integer type as the second operand, no matter what the controlling type is, and because shift amounts are always masked to at most 7 bits, it's always safe to remove both widening and narrowing conversions.
That seems like a good idea!
jameysharp submitted PR review.
jameysharp created PR review comment:
I think that's showing zero-extend rather than sign-extend, but yeah, it looks like LLVM doesn't try to do anything special for this pattern for u8 or u16.
afonso360 updated PR #6087 from riscv-zba
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Let's just sneak a little space in here:
(rule 3 (lower (has_type $I64 (ishl (uextend x @ (value_type $I32)) (maybe_uextend (imm12_from_value y)))))
afonso360 updated PR #6087 from riscv-zba
to main
.
afonso360 updated PR #6087 from riscv-zba
to main
.
afonso360 merged PR #6087.
Last updated: Jan 24 2025 at 00:11 UTC