julian-seward1 opened PR #2355 from arm64-simd-loadzero
to main
:
…ons.
This patch implements, for aarch64, the following wasm SIMD extensions.
v128.load32_zero and v128.load64_zero instructions
https://github.com/WebAssembly/simd/pull/237The changes are straightforward:
no new CLIF instructions. They are translated into an existing CLIF scalar
load followed by a CLIFscalar_to_vector
.the comment/specification for CLIF
scalar_to_vector
has been changed to
match the actual intended semantics, per consulation with Andrew Brown.translation from
scalar_to_vector
to the obvious aarch64 insns.special-case zero in
lower_constant_f128
in order to avoid a
potentially slow call toInst::load_fp_constant128
.Once "Allow loads to merge into other operations during instruction
selection in MachInst backends"
(https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
we can use that functionality to pattern match the two-CLIF pair and
emit a single AArch64 instruction.There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 updated PR #2355 from arm64-simd-loadzero
to main
:
…ons.
This patch implements, for aarch64, the following wasm SIMD extensions.
v128.load32_zero and v128.load64_zero instructions
https://github.com/WebAssembly/simd/pull/237The changes are straightforward:
no new CLIF instructions. They are translated into an existing CLIF scalar
load followed by a CLIFscalar_to_vector
.the comment/specification for CLIF
scalar_to_vector
has been changed to
match the actual intended semantics, per consulation with Andrew Brown.translation from
scalar_to_vector
to the obvious aarch64 insns.special-case zero in
lower_constant_f128
in order to avoid a
potentially slow call toInst::load_fp_constant128
.Once "Allow loads to merge into other operations during instruction
selection in MachInst backends"
(https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
we can use that functionality to pattern match the two-CLIF pair and
emit a single AArch64 instruction.There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown submitted PR Review.
julian-seward1 requested yurydelendik for a review on PR #2355.
cfallin submitted PR Review.
akirilov-arm submitted PR Review.
akirilov-arm submitted PR Review.
akirilov-arm created PR Review Comment:
This should simply be
Inst::MovToFpu
, and then there is no need forlower_constant_f128()
.BTW if the original load is a FP load, then this could be a move.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
From reading of
FMOV (general)
, I don't see anything that implies that lanes 1 and above of the destination register are zeroed. I may well have missed it though; can you clarify?
akirilov-arm submitted PR Review.
akirilov-arm created PR Review Comment:
The pseudocode for
FMOV (general)
uses the assignment form ofVpart[]
, which zero-extends the written value ifpart
is0
, as the comments in the pseudocode state.
julian-seward1 updated PR #2355 from arm64-simd-loadzero
to main
:
…ons.
This patch implements, for aarch64, the following wasm SIMD extensions.
v128.load32_zero and v128.load64_zero instructions
https://github.com/WebAssembly/simd/pull/237The changes are straightforward:
no new CLIF instructions. They are translated into an existing CLIF scalar
load followed by a CLIFscalar_to_vector
.the comment/specification for CLIF
scalar_to_vector
has been changed to
match the actual intended semantics, per consulation with Andrew Brown.translation from
scalar_to_vector
to the obvious aarch64 insns.special-case zero in
lower_constant_f128
in order to avoid a
potentially slow call toInst::load_fp_constant128
.Once "Allow loads to merge into other operations during instruction
selection in MachInst backends"
(https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
we can use that functionality to pattern match the two-CLIF pair and
emit a single AArch64 instruction.There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I rewrote it to generate
FMOV
only.
akirilov-arm submitted PR Review.
akirilov-arm submitted PR Review.
akirilov-arm created PR Review Comment:
BTW I don't mind the comment at al, but this operation is not special - virtually any instruction that operates on
S
orD
registers (e.g.Inst::FpuRR
) has exactly the same behaviour.
julian-seward1 requested cfallin and yurydelendik for a review on PR #2355.
cfallin submitted PR Review.
akirilov-arm edited PR Review Comment.
julian-seward1 merged PR #2355.
Last updated: Dec 23 2024 at 12:05 UTC