sstangl opened PR #1298 from rex2
to master
:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64
andb32_b64
encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
sstangl requested bnjbvr for a review on PR #1298.
iximeow created PR Review Comment:
small quibble: this PR doesn't touch REX prefixes for instructions with memory operands, but this is also a correct check for if the index register in a memory operand needs
X
. How do you feel about naming this simplyis_extended_reg
?
iximeow submitted PR Review.
iximeow created PR Review Comment:
This applies to all the
size_with_inferred_rex*
here - how do you feel about structuring bodies like:let needs_rex = (bits.rex_w() != 0) || // it'd be handy if `rex_w` could be obtained as a bool, alas test_input(0, inst, divert, func, needs_rex_rb) || test_input(1, inst, divert, func, needs_rex_rb); sizing.base_size + if needs_rex { 1 } else { 0 }? Where
input_test
andoutput_test
areadditional_size_if
but returning only the predicate? Admittedly I'm not a huge fan ofadditional_size_if
because it's a bit eager in turning "we need a prefix" into "it will be so-and-so many bytes" (which would be complicating with @abrown's mention of VEX/EVEX!), so having a generic "is this true for this operand" which we can use to later decide "we do want a REX prefix, and that's one byte", seems nice. I also personally like avoiding the early returns with hardcoded sizes, YMMV!
iximeow submitted PR Review.
iximeow created PR Review Comment:
Same here, but for
div
's GPR argument. Also thought that the index was off by one, but2
made me go check the use first :)
iximeow created PR Review Comment:
Can this comment be adjusted to indicate that this refers specifically to the second operand,
inreg1
, formulx
? Reading this independently had me thinking the operand index was off by one
iximeow submitted PR Review.
iximeow submitted PR Review.
sstangl submitted PR Review.
sstangl created PR Review Comment:
I like it. Will make that change.
sstangl updated PR #1298 from rex2
to master
:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64
andb32_b64
encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
iximeow submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: here and below, can you expand "in-regs" to "input regs" in comments, please? (and if it's pre-existing, replace it in other places in this file too)
bnjbvr created PR Review Comment:
ditto "two_in_regs" => "inreg_0_1" or "2inregs"? (or conversely, use the longer form for all if you think it's more readable)
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: for naming consistency with the other functions, how about
size_with_inferred_rex_for_inreg0
?
bnjbvr created PR Review Comment:
nit: please expand "out-reg" in comments
bnjbvr created PR Review Comment:
:+1: Can you open an issue for this and write the issue number here please?
bnjbvr created PR Review Comment:
oops, preexisting: these are called Templates since the migration to Rust, not TailRecipes anymore. (a few times below too)
sstangl merged PR #1298.
sstangl updated PR #1298 from rex2
to master
:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64
andb32_b64
encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Doc says 0F (as emitted), yet compared with 04.
bjorn3 created PR Review Comment:
*REZ.X
bjorn3 submitted PR Review.
bjorn3 edited PR Review Comment.
sstangl created PR Review Comment:
That assertion is copied from 'put_rexopt2()
. These assertions should definitely not open-code the bit-patterns, but should be changed to use the new
EncodingBits` struct.
sstangl submitted PR Review.
Last updated: Jan 24 2025 at 00:11 UTC