sstangl opened PR #1298 from rex2 to master:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64andb32_b64encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
sstangl requested bnjbvr for a review on PR #1298.
iximeow created PR Review Comment:
small quibble: this PR doesn't touch REX prefixes for instructions with memory operands, but this is also a correct check for if the index register in a memory operand needs
X. How do you feel about naming this simplyis_extended_reg?
iximeow submitted PR Review.
iximeow created PR Review Comment:
This applies to all the
size_with_inferred_rex*here - how do you feel about structuring bodies like:let needs_rex = (bits.rex_w() != 0) || // it'd be handy if `rex_w` could be obtained as a bool, alas test_input(0, inst, divert, func, needs_rex_rb) || test_input(1, inst, divert, func, needs_rex_rb); sizing.base_size + if needs_rex { 1 } else { 0 }? Where
input_testandoutput_testareadditional_size_ifbut returning only the predicate? Admittedly I'm not a huge fan ofadditional_size_ifbecause it's a bit eager in turning "we need a prefix" into "it will be so-and-so many bytes" (which would be complicating with @abrown's mention of VEX/EVEX!), so having a generic "is this true for this operand" which we can use to later decide "we do want a REX prefix, and that's one byte", seems nice. I also personally like avoiding the early returns with hardcoded sizes, YMMV!
iximeow submitted PR Review.
iximeow created PR Review Comment:
Same here, but for
div's GPR argument. Also thought that the index was off by one, but2made me go check the use first :)
iximeow created PR Review Comment:
Can this comment be adjusted to indicate that this refers specifically to the second operand,
inreg1, formulx? Reading this independently had me thinking the operand index was off by one
iximeow submitted PR Review.
iximeow submitted PR Review.
sstangl submitted PR Review.
sstangl created PR Review Comment:
I like it. Will make that change.
sstangl updated PR #1298 from rex2 to master:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64andb32_b64encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
iximeow submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: here and below, can you expand "in-regs" to "input regs" in comments, please? (and if it's pre-existing, replace it in other places in this file too)
bnjbvr created PR Review Comment:
ditto "two_in_regs" => "inreg_0_1" or "2inregs"? (or conversely, use the longer form for all if you think it's more readable)
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: for naming consistency with the other functions, how about
size_with_inferred_rex_for_inreg0?
bnjbvr created PR Review Comment:
nit: please expand "out-reg" in comments
bnjbvr created PR Review Comment:
:+1: Can you open an issue for this and write the issue number here please?
bnjbvr created PR Review Comment:
oops, preexisting: these are called Templates since the migration to Rust, not TailRecipes anymore. (a few times below too)
sstangl merged PR #1298.
sstangl updated PR #1298 from rex2 to master:
- [x] This has been discussed in https://github.com/bytecodealliance/cranelift/pull/1173. The original approach was to avoid using Templates, but it was extremely complicated. This approach is much better to reduce the number of recipes.
- [x] This patch adds a third mode for templates: REX inference is requestable at template instantiation time. This reduces the number of recipes by removing rex()/nonrex() redundancy for many instructions. In the common case of
i32_i64andb32_b64encodings, this inferrable-REX mode is used.- [x] This PR contains test cases, if meaningful. (It's covered by the filetests.)
- [x] A reviewer from the core maintainer team has been assigned for this PR.
I understand that this is work will likely be overridden by what Chris is working on. However, I believe this is still a good idea to land, because:
- It is a strict decrease in the number of recipes.
- It makes more of the REX-related code explicit, and therefore visible. Removing implicit logic makes the code easier to reason about, and therefore easier to refactor later.
- Because the recipe count decreases, there should be a small performance improvement due to faster encoding -- there are fewer options from which to select.
- The transformation is fairly mechanical and I believe not difficult to review.
This invalidates the work in https://github.com/bytecodealliance/cranelift/pull/1173. That approach did not work because it tried to avoid using Templates. Removing templates at the same time as making this change resulted in too many moving parts -- it was overly complicated and error-prone.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Doc says 0F (as emitted), yet compared with 04.
bjorn3 created PR Review Comment:
*REZ.X
bjorn3 submitted PR Review.
bjorn3 edited PR Review Comment.
sstangl created PR Review Comment:
That assertion is copied from 'put_rexopt2()
. These assertions should definitely not open-code the bit-patterns, but should be changed to use the newEncodingBits` struct.
sstangl submitted PR Review.
Last updated: Dec 06 2025 at 05:03 UTC