cfallin opened PR #1865 from aarch64-amode-reg-reg-extend
to master
:
When a load/store instruction needs an address of the form
v0 + uextend(v1)
orv0 + sextend(v1)
(or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain[rA, rB]
addressing mode. This patch extendslower_address()
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a[rA, rB, UXTW]
or[rA, rB, SXTW]
form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested julian-seward1 for a review on PR #1865.
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
These 4 clauses computing
parts
are a bit duplication-ful. If the logic gets extended at some point to deal with 16- or 8-bit indices, it'll be even worse. Hence: is there some way to reduce the level duplication, without making it harder to follow?
cfallin updated PR #1865 from aarch64-amode-reg-reg-extend
to master
:
When a load/store instruction needs an address of the form
v0 + uextend(v1)
orv0 + sextend(v1)
(or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain[rA, rB]
addressing mode. This patch extendslower_address()
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a[rA, rB, UXTW]
or[rA, rB, SXTW]
form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Sure, factored out (i) the signed/unsigned duality, with a new helper
maybe_input_insn_multi
; and (ii) the two commutative cases, with a for-loop. Thanks!
cfallin updated PR #1865 from aarch64-amode-reg-reg-extend
to master
:
When a load/store instruction needs an address of the form
v0 + uextend(v1)
orv0 + sextend(v1)
(or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain[rA, rB]
addressing mode. This patch extendslower_address()
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a[rA, rB, UXTW]
or[rA, rB, SXTW]
form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin merged PR #1865.
Last updated: Jan 24 2025 at 00:11 UTC