alexcrichton opened PR #5892 from relaxed-simd-maybe
to main
:
This commit is an implementation of the [relaxed SIMD proposal][proposal] which is expected to soon move to phase 4. This is a relatively small proposal which adds 20 new instructions operating over the preexisting
v128
type. The purpose of this proposal is to expose instructions that are not guaranteed to be deterministic across host platforms. For example the instructions added have different behavior for some inputs on x86_64 and aarch64. I believe the intention is that many real-world use cases for these instructions have inputs which "avoid the edge cases" to have the same behavior across platforms.Additionally as part of the specification of the relaxed SIMD proposal, however, there are annotated deterministic semantics for each instruction. In other words while each instruction is allowed to have different semantics across platforms there is still one listed canonical set of semantics. This commit implements this knobs as a
Config
option in Wasmtime. By default all relaxed simd instructions use their host-defined semantics (the most efficient ones) but a newConfig::relaxed_simd_deterministic
knob can be used to alter the codegen and force deterministic semantics.An example of this is that the
f32x4.relaxed_madd
instruction, a fused multiply and add instruction, can either be a fused operation or separate operations. By default Wasmtime will select the most efficient one, namely using two operations on x86_64 when thefma
feature isn't available. If therelaxed_simd_deterministic
option is enabled, however, then regardless of whetherfma
is available the fused semantics will be implemented as they're the deterministic semantics for this instruction.This commit includes an implementation of these instructions for the x64 and the aarch64 backend. The aarch64 implementation ended up being trivial since all of its instructions match the deterministic semantics of the proposal. Only support for encoding the
fmls
instruction was added since it wasn't already implemented. On x86_64 the implementation is much more subtle and exploits various edge cases in the proposal (leading me to the conclusion that the relaxed simd proposal is basically the "let's acknowledge the nonstandard, but dominant, semantics of x86_64 proposal").Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with
arch_*
. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.There are a number of more optimal AVX512-based lowerings for the x86_64 backend for these instructions, but I haven't implemented any of them in this PR. All the support for x86_64 is based on AVX and below.
Procedurally the spec tests were copied from the upstream proposal into the Wasmtime repository here since the WebAssembly/testsuite repo is somewhat inactive right now and it was the easiest way to integrate them. When the upstream repository is updated the local copies here can be deleted in favor of the upstream implementations.
Finally I have not done rigorous testing of this PR yet. I've tried to be meticulous with rationalizing the spec-defined semantics of these wasm instructions against the architecture-specific behavior of each instruction. This has resulted in a number of PRs to the upstream specification to align with the intended instructions or clarify some certain lowerings. For example x86_64's lowering of
i16x8.relaxed_laneselect
is not correct with respect to the current specification but there is an issue open for this upstream. There is unfortunately no great way to fuzz these instructions inherently due to their nondeterministic semantics. Without adifferential
fuzzer all we can really do is verify that they compile which doesn't really provide much of a guarantee. My hope is that this can be considered more carefully before any decision to enable this proposal by default.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
This is a good note; can you create an issue to track this or let me know and I will create one? To me this comment boils down to "a better lowering is possible" and that seems like a good thing to remember to do.
abrown created PR review comment:
Maybe this English version helps?
This instruction will permute the 8-bit lanes of `x` with the indices specified in `y`. Each lane in the mask, `y`, uses the bottom four bits for selecting the lane from `x` unless the most significant bit is set, in which case the lane is zeroed. The output vector will have the following contents when the element of `y` is in these ranges: * `[0, 127]` -> `x[y[i] % 16]` * `[128, 255]` -> 0
abrown created PR review comment:
Should this instead be an
AND
condition (same below)?if environ.relaxed_simd_deterministic() && environ.has_native_fma() {
If someone turned off the relaxed SIMD feature but then their system had FMA available, they might get unexpectedly-rounded results? By turning off the relaxed SIMD feature, aren't they explicitly saying they want the double rounding?
abrown created PR review comment:
Maybe? (same below):
if !environ.relaxed_simd_deterministic() || !environ.is_x86() {
abrown created PR review comment:
I now see the copied-in WAST files down below and it might be good to track this task in an issue so someone else can pick it up once the testsuite is updated. Have you created a PR to upstreamed the changes that were necessary here to the relaxed SIMD testsuite?
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
This is currently intentional because I believe the "deterministic" sematnics for fma are the fused semantics, which the
fma
CLIF instruction guarantees. It's only the case where determinism isn't required and fma isn't available that the otherwise fastest lowering of two operations is used.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Good idea, I opened https://github.com/bytecodealliance/wasmtime/issues/5914 for this. I created https://github.com/WebAssembly/testsuite/pull/65 but it sat for awhile with no response and is now outdated due to upstream changes, and I'm not sure who, if anyone, is watching that repository.
alexcrichton created PR review comment:
This is currently intentional where if determinism is required it takes the first branch which has the deterministic lowering in terms of CLIF instruction semantics. The branch isn't taken though for x86 when speed is desired b/c x86 has a faster lowering.
If you have a suggestion though to restructure this to more obviously communicate this intent it's probably good to implement though since the current
if
is pretty subtle and I think easy to lose track of.
alexcrichton submitted PR review.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
abrown submitted PR review.
abrown created PR review comment:
Ah, ok... yeah, these conditions are tricky to think through.
abrown submitted PR review.
abrown created PR review comment:
Ok, makes sense. In terms of communication, I guess the missing piece of information for me was that I couldn't remember off the top of my head which lowering had the deterministic semantics. Now, obviously, I should have trusted that the
if environ.relaxed_simd_deterministic()
side of the branch was the deterministic lowering but in this review I was starting to wonder which lowering was which. This may just be a review issue, but if you wanted to make it extra clear the deterministic lowering could get a comment like:// This is the deterministic lowering for ..., see https://github.com/WebAssembly/relaxed-simd/...
.
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I think the spec isn't quite in the state where there's a URL to link to unfortunately. I don't think that the deterministic semantics are actually written down anywhere other than the spec draft currently. I'll try to remember to update things here though once that's update.
Otherwise though I went ahead and added comments to all the deterministic cases at least english-explaining what the intention was.
cfallin submitted PR review.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I think I'd prefer not to have a default implementation of
has_native_fma
, or it should default tofalse
. That seems to me like something that a backend needs to opt in to, not opt out of.
jameysharp created PR review comment:
This comment seems wrong since only one possible outcome is asserted.
jameysharp created PR review comment:
I'm not sure what this sentence was supposed to say. Maybe delete "the fastest", or replace "this difference in" with something else?
alexcrichton updated PR #5892 from relaxed-simd-maybe
to main
.
alexcrichton has enabled auto merge for PR #5892.
alexcrichton created PR review comment:
Ah this is just copied from the upstream spec which I think is just a copy/paste artifact, but if you'd like I think they're pretty accepting of PRs.
alexcrichton submitted PR review.
alexcrichton merged PR #5892.
Last updated: Nov 22 2024 at 16:03 UTC