afonso360 opened PR #5844 from riscv-extend to main:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr, or we do anandi, reg, 255for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shras a generic fallbackandi, reg, 255for zero extendingI8'saddiw, reg, 0for sign extendingI32's intoI64's (this is also known assext.w)With the
zbbextension we get a few more lowerings:
sext.bfor sign extendingI8'ssext.hfor sign extendingI16'szext.hfor zero extendingI16'sAdditionally we also use
zbkbfor the remaining cases:
packwfor zero extendingI16'spackfor zero extendingI32'sWith the right extensions we are now guaranteed to have a single instruction extension (for <= 64 bits) and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extendsince its used in some ABI code and as part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbborzbkbinstructions.
Inst::Extendused to report for examplesext.b, but actually always lowers asshl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkbis available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 edited PR #5844 from riscv-extend to main:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr, or we do anandi, reg, 255for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shras a generic fallbackandi, reg, 255for zero extendingI8'saddiw, reg, 0for sign extendingI32's intoI64's (this is also known assext.w)With the
zbbextension we get a few more lowerings:
sext.bfor sign extendingI8'ssext.hfor sign extendingI16'szext.hfor zero extendingI16'sAdditionally we also use
zbkbfor the remaining cases:
packwfor zero extendingI16'spackfor zero extendingI32'sWith the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extendsince its used in some ABI code and as part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbborzbkbinstructions.
Inst::Extendused to report for examplesext.b, but actually always lowers asshl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkbis available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 edited PR #5844 from riscv-extend to main:
:wave: Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr, or we do anandi, reg, 255for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shras a generic fallbackandi, reg, 255for zero extendingI8'saddiw, reg, 0for sign extendingI32's intoI64's (this is also known assext.w)With the
zbbextension we get a few more lowerings:
sext.bfor sign extendingI8'ssext.hfor sign extendingI16'szext.hfor zero extendingI16'sAdditionally we also use
zbkbfor the remaining cases:
packwfor zero extendingI16'spackfor zero extendingI32'sWith the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extendsince its used in some ABI code and is part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbborzbkbinstructions.
Inst::Extendused to report for examplesext.b, but actually always lowers asshl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkbis available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.
afonso360 updated PR #5844 from riscv-extend to main.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Just noting for completeness: This term could safely be made
(decl pure ...). It's not necessary though since the constructor is only used on the RHS of an impure term. It's also a nice motivating example for #5771.
jameysharp created PR review comment:
I don't think any of these rules overlap with each other, so it should be safe to place them all at the same priority. If I'm wrong, ISLE will tell you so :grin:
jameysharp created PR review comment:
A thought:
rule 9here could usefits_in_64and still be correct in isolation. That has the advantage of matching the structure ofrule 8(the signed case), which is nice for reasoning; but the disadvantage that it does a recursive call toextendjust to discover that both type arguments are the same and return the input unchanged.If you did make that change though you could delete
rule 10, becauserule 9would produce the same result without the special case. I think that would be slightly nicer.Also, rules 8 and 9 can share the same priority since they differ on the
ExtendOpargument. They do currently need a different priority than all the other rules though since ISLE can't yet figure out that$I128doesn't overlap withfits_in_64. I want to fix that sooner or later...
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, that worked! I had the idea that isle wasn't able to reason very well past things like
fits_in_64and would start throwing overlap errors. I guess I was wrong!
afonso360 submitted PR review.
afonso360 created PR review comment:
Having something like a match would be really nice! Or at least being able to scope these rules so that they don't accidentally get used by other rules.
afonso360 updated PR #5844 from riscv-extend to main.
afonso360 submitted PR review.
afonso360 created PR review comment:
I think that ended up being a bit cleaner! I like having the signed and unsigned rules being "visually" equal. Thanks!
They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...
Oh, I think that might be why I started thinking that ISLE wouldn't reason past
fits_in_64, good to know that its only in I128's.
afonso360 merged PR #5844.
jameysharp submitted PR review.
jameysharp created PR review comment:
That's not quite it. The issue is that ISLE doesn't know anything about when an extractor like
fits_in_64will match. If you have two patterns where one is$I64and the other is$I32, it can see that those are different. If the patterns are bothfits_in_64, it knows that both patterns will produce the same result on the same input so it can check deeper in the tree of patterns. But if the patterns arefits_in_64versusfits_in_32, or versus$I16, ISLE can't draw any conclusions about whether those overlap or not.I want to fix this by making ISLE understand or-patterns, then defining
fits_in_64as the equivalent of$I8 | $I16 | .... By making more semantics visible to ISLE, we'll get better pattern-matching code out as well as more precise overlap checks. Someday...
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, makes sense, Thanks for explaining that!
Last updated: Dec 13 2025 at 19:03 UTC