Stream: git-wasmtime

Topic: wasmtime / PR #6087 riscv64: Add `Zba` extension instruct...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 11:59):

afonso360 opened PR #6087 from riscv-zba to main:

:wave: Hey,

This PR adds the instructions present in the Zba extension to the RISC-V 64 backend. The Zba extension contains instructions for address generation.

Here's a quick summary:
| Mnemonic | Description |
| ------------ | ------------ |
| add.uw rd, rs1, rs2 | Add unsigned word |
| sh1add rd, rs1, rs2 | Shift left by 1 and add |
| sh1add.uw rd, rs1, rs2 | Shift unsigned word left by 1 and add |
| sh2add rd, rs1, rs2 | Shift left by 2 and add |
| sh2add.uw rd, rs1, rs2 | Shift unsigned word left by 2 and add |
| sh3add rd, rs1, rs2 | Shift left by 3 and add |
| sh3add.uw rd, rs1, rs2 | Shift unsigned word left by 3 and add |
| slli.uw rd, rs1, imm | Shift-left unsigned word (Immediate) |

Besides directly matching the above, we also add the zext.w mnemnoic which is essentially add.uw rd, rs, zero.

I've left this fuzzing overnight on a riscv64 machine and all seems ok so far.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 12:46):

afonso360 updated PR #6087 from riscv-zba to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

A uextend can't have a result type of the same width as its operand, so (ty_32_or_64 ty) can be replaced with $I64 in these rules.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

I believe this can be at priority 2 without overlapping with the zext.h rule above, since this rule only extends from I32 and that rule only extends from I16.

Then I think you can undo all the priority changes in the remaining extend rules below.

That said, I like the idea of putting both the I128 rules at the same priority, so it's clear that the only difference between them is the ExtendOp. (That may improve the ISLE compiler's output by making it clear that the other pattern matches can all be shared.) So I'd be in favor of still bumping the sign-extend-to-128 case up to priority 3.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

Nice catch, noticing that this rule could never fire because the fits_in_64 case would always match first. I had extended the overlap checker to catch that sort of bug, but since it doesn't know how extractors like fits_in_64 and fits_in_32 are related, it couldn't find this particular issue.

That said, is there any benefit to treating 32-bit iadd differently than 64-bit iadd? The upper bits shouldn't matter on either input or output.

I see that addw sign-extends the 32-bit result to 64-bits, so using it would make sense in a rule matching (has_type $I64 (sextend (has_type $I32 (iadd x y)))).

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

This is another case where you can replace (fits_in_64 ty) with $I64, because there are no other types which satisfy both that pattern and the constraints on uextend from $I32.

Possible future work: remove all uses of maybe_uextend from ishl lowering rules, and add egraph rules to remove any uextend/sextend/ireduce from the second operand to ishl/ushr/sshr. Since Cranelift's shifts all accept any integer type as the second operand, no matter what the controlling type is, and because shift amounts are always masked to at most 7 bits, it's always safe to remove both widening and narrowing conversions.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

ISLE will generate slightly better Rust if you use the infallible u64_from_imm64 extractor instead of the fallible uimm8 extractor. In both cases the patterns will match the same inputs, but with uimm8 it'll test that the input is in the range 0..256 before checking against the specific constant values.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

I think that, like add.uw, the shnadd.uw instructions will only give the correct result if we're zero-extending from exactly $I32 to exactly $I64.

I'd also request a VERY LOUD comment here :sweat_smile: that these instructions can only implement (ishl (uextend _) _), _not_ (uextend (ishl _ _)). Getting that wrong on x86 was basically the cause of our last CVE. I believe you've gotten them right here, if I read the instruction-set manual correctly, but I was worried for a moment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

Let's add more runtests which are like sh1add_uext but with the uextend on the result of the ishl instead of on its first operand. These additional tests should verify that when v1 has its most significant N bits set, those bits aren't added into the upper half of v0.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 22 2023 at 22:16):

jameysharp created PR review comment:

What would you think of this replacement, and similarly in sh3add_uext doing the same but with 8 instead of 4? It's slightly easier for me to do the math in my head that way.

; run: %sh2add_uext(4, 0xFFFFFFFF) == 0x400000000

I'm also thinking something like these tests would be good:

; run: %sh1add_uext(0x100000000, 0x80000000) == 0x200000000
; run: %sh2add_uext(0x100000000, 0x80000000) == 0x300000000
; run: %sh3add_uext(0x100000000, 0x80000000) == 0x500000000

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 14:36):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 14:36):

afonso360 created PR review comment:

That said, is there any benefit to treating 32-bit iadd differently than 64-bit iadd? The upper bits shouldn't matter on either input or output.

Not really other than perhaps it being a hint that we only care about the 32bits for whoever is reading the ASM? Not sure that warrants a different rule.

I see that addw sign-extends the 32-bit result to 64-bits, so using it would make sense in a rule matching (has_type $I64 (sextend (has_type $I32 (iadd x y)))).

Yeah that's a good suggestion, I'll add that!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 14:38):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:32):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:32):

afonso360 created PR review comment:

That seems correct, and I think since we don't force uextend here to be i32, it's wrong for i8/i16.

I also double checked against LLVM, and they also manually sign extend i16/i8 before using shnadd.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:36):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:36):

afonso360 created PR review comment:

This is another case where you can replace (fits_in_64 ty) with $I64, because there are no other types which satisfy both that pattern and the constraints on uextend from $I32.

:+1:

Possible future work: remove all uses of maybe_uextend from ishl lowering rules, and add egraph rules to remove any uextend/sextend/ireduce from the second operand to ishl/ushr/sshr. Since Cranelift's shifts all accept any integer type as the second operand, no matter what the controlling type is, and because shift amounts are always masked to at most 7 bits, it's always safe to remove both widening and narrowing conversions.

That seems like a good idea!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:38):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 15:38):

jameysharp created PR review comment:

I think that's showing zero-extend rather than sign-extend, but yeah, it looks like LLVM doesn't try to do anything special for this pattern for u8 or u16.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 18:30):

afonso360 updated PR #6087 from riscv-zba to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 18:47):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 18:47):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 18:47):

jameysharp created PR review comment:

Let's just sneak a little space in here:

(rule 3 (lower (has_type $I64 (ishl (uextend x @ (value_type $I32)) (maybe_uextend (imm12_from_value y)))))

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 18:52):

afonso360 updated PR #6087 from riscv-zba to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 19:12):

afonso360 updated PR #6087 from riscv-zba to main.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 20:36):

afonso360 merged PR #6087.


Last updated: Oct 23 2024 at 20:03 UTC