afonso360 opened PR #5844 from riscv-extend
to main
:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr
, or we do anandi, reg, 255
for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shr
as a generic fallbackandi, reg, 255
for zero extendingI8
'saddiw, reg, 0
for sign extendingI32
's intoI64
's (this is also known assext.w
)With the
zbb
extension we get a few more lowerings:
sext.b
for sign extendingI8
'ssext.h
for sign extendingI16
'szext.h
for zero extendingI16
'sAdditionally we also use
zbkb
for the remaining cases:
packw
for zero extendingI16
'spack
for zero extendingI32
'sWith the right extensions we are now guaranteed to have a single instruction extension (for <= 64 bits) and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extend
since its used in some ABI code and as part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbb
orzbkb
instructions.
Inst::Extend
used to report for examplesext.b
, but actually always lowers asshl + shr
, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkb
is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 edited PR #5844 from riscv-extend
to main
:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr
, or we do anandi, reg, 255
for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shr
as a generic fallbackandi, reg, 255
for zero extendingI8
'saddiw, reg, 0
for sign extendingI32
's intoI64
's (this is also known assext.w
)With the
zbb
extension we get a few more lowerings:
sext.b
for sign extendingI8
'ssext.h
for sign extendingI16
'szext.h
for zero extendingI16
'sAdditionally we also use
zbkb
for the remaining cases:
packw
for zero extendingI16
'spack
for zero extendingI32
'sWith the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extend
since its used in some ABI code and as part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbb
orzbkb
instructions.
Inst::Extend
used to report for examplesext.b
, but actually always lowers asshl + shr
, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkb
is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 edited PR #5844 from riscv-extend
to main
:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr
, or we do anandi, reg, 255
for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shr
as a generic fallbackandi, reg, 255
for zero extendingI8
'saddiw, reg, 0
for sign extendingI32
's intoI64
's (this is also known assext.w
)With the
zbb
extension we get a few more lowerings:
sext.b
for sign extendingI8
'ssext.h
for sign extendingI16
'szext.h
for zero extendingI16
'sAdditionally we also use
zbkb
for the remaining cases:
packw
for zero extendingI16
'spack
for zero extendingI32
'sWith the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extend
since its used in some ABI code and is part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbb
orzbkb
instructions.
Inst::Extend
used to report for examplesext.b
, but actually always lowers asshl + shr
, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkb
is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 updated PR #5844 from riscv-extend
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Just noting for completeness: This term could safely be made
(decl pure ...)
. It's not necessary though since the constructor is only used on the RHS of an impure term. It's also a nice motivating example for #5771.
jameysharp created PR review comment:
I don't think any of these rules overlap with each other, so it should be safe to place them all at the same priority. If I'm wrong, ISLE will tell you so :grin:
jameysharp created PR review comment:
A thought:
rule 9
here could usefits_in_64
and still be correct in isolation. That has the advantage of matching the structure ofrule 8
(the signed case), which is nice for reasoning; but the disadvantage that it does a recursive call toextend
just to discover that both type arguments are the same and return the input unchanged.If you did make that change though you could delete
rule 10
, becauserule 9
would produce the same result without the special case. I think that would be slightly nicer.Also, rules 8 and 9 can share the same priority since they differ on the
ExtendOp
argument. They do currently need a different priority than all the other rules though since ISLE can't yet figure out that$I128
doesn't overlap withfits_in_64
. I want to fix that sooner or later...
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, that worked! I had the idea that isle wasn't able to reason very well past things like
fits_in_64
and would start throwing overlap errors. I guess I was wrong!
afonso360 submitted PR review.
afonso360 created PR review comment:
Having something like a match would be really nice! Or at least being able to scope these rules so that they don't accidentally get used by other rules.
afonso360 updated PR #5844 from riscv-extend
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
I think that ended up being a bit cleaner! I like having the signed and unsigned rules being "visually" equal. Thanks!
They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...
Oh, I think that might be why I started thinking that ISLE wouldn't reason past
fits_in_64
, good to know that its only in I128's.
afonso360 merged PR #5844.
jameysharp submitted PR review.
jameysharp created PR review comment:
That's not quite it. The issue is that ISLE doesn't know anything about when an extractor like
fits_in_64
will match. If you have two patterns where one is$I64
and the other is$I32
, it can see that those are different. If the patterns are bothfits_in_64
, it knows that both patterns will produce the same result on the same input so it can check deeper in the tree of patterns. But if the patterns arefits_in_64
versusfits_in_32
, or versus$I16
, ISLE can't draw any conclusions about whether those overlap or not.I want to fix this by making ISLE understand or-patterns, then defining
fits_in_64
as the equivalent of$I8 | $I16 | ...
. By making more semantics visible to ISLE, we'll get better pattern-matching code out as well as more precise overlap checks. Someday...
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, makes sense, Thanks for explaining that!
Last updated: Nov 22 2024 at 17:03 UTC