Stream: git-wasmtime

Topic: wasmtime / PR #5844 riscv64: Improve signed and zero exte...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 20 2023 at 21:21):

afonso360 opened PR #5844 from riscv-extend to main:

:wave: Hey,

This PR improves codegen for signed and zero extensions on the riscv64 backend.

The current status is that we have two lowerings. Either we do a shl + shr, or we do an andi, reg, 255 for zero exending I8's.

This PR moves the extension logic into ISLE and adds a few more optimizations.

For the base instruction set we have the following:

With the zbb extension we get a few more lowerings:

Additionally we also use zbkb for the remaining cases:

With the right extensions we are now guaranteed to have a single instruction extension (for <= 64 bits) and at most two for I128, which is nice.

I've also cleaned up the I128 codegen cases a little bit.


Unfortunately I can't remove the Inst::Extend since its used in some ABI code and as part of the atomic lowerings.

I don't think we have a way to call into the ISLE lowerings in these cases right?


Notes about the test diffs:

Capstone does not recognize most of the zbb or zbkb instructions.

Inst::Extend used to report for example sext.b, but actually always lowers as shl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.

The capstone disassembly diff is the correct one in these cases.


zbkb is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.

I've tested these manually by force enabling the extensions and running the runtests.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 20 2023 at 21:22):

afonso360 edited PR #5844 from riscv-extend to main:

:wave: Hey,

This PR improves codegen for signed and zero extensions on the riscv64 backend.

The current status is that we have two lowerings. Either we do a shl + shr, or we do an andi, reg, 255 for zero exending I8's.

This PR moves the extension logic into ISLE and adds a few more optimizations.

For the base instruction set we have the following:

With the zbb extension we get a few more lowerings:

Additionally we also use zbkb for the remaining cases:

With the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.

I've also cleaned up the I128 codegen cases a little bit.


Unfortunately I can't remove the Inst::Extend since its used in some ABI code and as part of the atomic lowerings.

I don't think we have a way to call into the ISLE lowerings in these cases right?


Notes about the test diffs:

Capstone does not recognize most of the zbb or zbkb instructions.

Inst::Extend used to report for example sext.b, but actually always lowers as shl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.

The capstone disassembly diff is the correct one in these cases.


zbkb is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.

I've tested these manually by force enabling the extensions and running the runtests.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 00:01):

afonso360 edited PR #5844 from riscv-extend to main:

:wave: Hey,

This PR improves codegen for signed and zero extensions on the riscv64 backend.

The current status is that we have two lowerings. Either we do a shl + shr, or we do an andi, reg, 255 for zero exending I8's.

This PR moves the extension logic into ISLE and adds a few more optimizations.

For the base instruction set we have the following:

With the zbb extension we get a few more lowerings:

Additionally we also use zbkb for the remaining cases:

With the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.

I've also cleaned up the I128 codegen cases a little bit.


Unfortunately I can't remove the Inst::Extend since its used in some ABI code and is part of the atomic lowerings.

I don't think we have a way to call into the ISLE lowerings in these cases right?


Notes about the test diffs:

Capstone does not recognize most of the zbb or zbkb instructions.

Inst::Extend used to report for example sext.b, but actually always lowers as shl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.

The capstone disassembly diff is the correct one in these cases.


zbkb is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.

I've tested these manually by force enabling the extensions and running the runtests.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 12:30):

afonso360 updated PR #5844 from riscv-extend to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 23:35):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 23:35):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 23:36):

jameysharp created PR review comment:

Just noting for completeness: This term could safely be made (decl pure ...). It's not necessary though since the constructor is only used on the RHS of an impure term. It's also a nice motivating example for #5771.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 23:36):

jameysharp created PR review comment:

I don't think any of these rules overlap with each other, so it should be safe to place them all at the same priority. If I'm wrong, ISLE will tell you so :grin:

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 23:36):

jameysharp created PR review comment:

A thought: rule 9 here could use fits_in_64 and still be correct in isolation. That has the advantage of matching the structure of rule 8 (the signed case), which is nice for reasoning; but the disadvantage that it does a recursive call to extend just to discover that both type arguments are the same and return the input unchanged.

If you did make that change though you could delete rule 10, because rule 9 would produce the same result without the special case. I think that would be slightly nicer.

Also, rules 8 and 9 can share the same priority since they differ on the ExtendOp argument. They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:09):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:09):

afonso360 created PR review comment:

Yeah, that worked! I had the idea that isle wasn't able to reason very well past things like fits_in_64 and would start throwing overlap errors. I guess I was wrong!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:11):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:11):

afonso360 created PR review comment:

Having something like a match would be really nice! Or at least being able to scope these rules so that they don't accidentally get used by other rules.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:20):

afonso360 updated PR #5844 from riscv-extend to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:22):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 11:22):

afonso360 created PR review comment:

I think that ended up being a bit cleaner! I like having the signed and unsigned rules being "visually" equal. Thanks!

They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...

Oh, I think that might be why I started thinking that ISLE wouldn't reason past fits_in_64, good to know that its only in I128's.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 18:30):

afonso360 merged PR #5844.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 18:42):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 18:42):

jameysharp created PR review comment:

That's not quite it. The issue is that ISLE doesn't know anything about when an extractor like fits_in_64 will match. If you have two patterns where one is $I64 and the other is $I32, it can see that those are different. If the patterns are both fits_in_64, it knows that both patterns will produce the same result on the same input so it can check deeper in the tree of patterns. But if the patterns are fits_in_64 versus fits_in_32, or versus $I16, ISLE can't draw any conclusions about whether those overlap or not.

I want to fix this by making ISLE understand or-patterns, then defining fits_in_64 as the equivalent of $I8 | $I16 | .... By making more semantics visible to ISLE, we'll get better pattern-matching code out as well as more precise overlap checks. Someday...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 19:04):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2023 at 19:04):

afonso360 created PR review comment:

Yeah, makes sense, Thanks for explaining that!


Last updated: Oct 23 2024 at 20:03 UTC