wasmtime / PR #5892 Implement the relaxed SIMD proposal · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #5892 Implement the relaxed SIMD proposal

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:11):

alexcrichton opened PR #5892 from relaxed-simd-maybe to main:

This commit is an implementation of the [relaxed SIMD proposal][proposal] which is expected to soon move to phase 4. This is a relatively small proposal which adds 20 new instructions operating over the preexisting v128 type. The purpose of this proposal is to expose instructions that are not guaranteed to be deterministic across host platforms. For example the instructions added have different behavior for some inputs on x86_64 and aarch64. I believe the intention is that many real-world use cases for these instructions have inputs which "avoid the edge cases" to have the same behavior across platforms.

Additionally as part of the specification of the relaxed SIMD proposal, however, there are annotated deterministic semantics for each instruction. In other words while each instruction is allowed to have different semantics across platforms there is still one listed canonical set of semantics. This commit implements this knobs as a Config option in Wasmtime. By default all relaxed simd instructions use their host-defined semantics (the most efficient ones) but a new Config::relaxed_simd_deterministic knob can be used to alter the codegen and force deterministic semantics.

An example of this is that the f32x4.relaxed_madd instruction, a fused multiply and add instruction, can either be a fused operation or separate operations. By default Wasmtime will select the most efficient one, namely using two operations on x86_64 when the fma feature isn't available. If the relaxed_simd_deterministic option is enabled, however, then regardless of whether fma is available the fused semantics will be implemented as they're the deterministic semantics for this instruction.

This commit includes an implementation of these instructions for the x64 and the aarch64 backend. The aarch64 implementation ended up being trivial since all of its instructions match the deterministic semantics of the proposal. Only support for encoding the fmls instruction was added since it wasn't already implemented. On x86_64 the implementation is much more subtle and exploits various edge cases in the proposal (leading me to the conclusion that the relaxed simd proposal is basically the "let's acknowledge the nonstandard, but dominant, semantics of x86_64 proposal").

Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with arch_*. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.

There are a number of more optimal AVX512-based lowerings for the x86_64 backend for these instructions, but I haven't implemented any of them in this PR. All the support for x86_64 is based on AVX and below.

Procedurally the spec tests were copied from the upstream proposal into the Wasmtime repository here since the WebAssembly/testsuite repo is somewhat inactive right now and it was the easiest way to integrate them. When the upstream repository is updated the local copies here can be deleted in favor of the upstream implementations.

Finally I have not done rigorous testing of this PR yet. I've tried to be meticulous with rationalizing the spec-defined semantics of these wasm instructions against the architecture-specific behavior of each instruction. This has resulted in a number of PRs to the upstream specification to align with the intended instructions or clarify some certain lowerings. For example x86_64's lowering of i16x8.relaxed_laneselect is not correct with respect to the current specification but there is an issue open for this upstream. There is unfortunately no great way to fuzz these instructions inherently due to their nondeterministic semantics. Without a differential fuzzer all we can really do is verify that they compile which doesn't really provide much of a guarantee. My hope is that this can be considered more carefully before any decision to enable this proposal by default.

Wasmtime GitHub notifications bot (Feb 28 2023 at 23:31):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 01 2023 at 00:15):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 01 2023 at 15:15):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 01 2023 at 20:23):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 01 2023 at 20:23):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown created PR review comment:

This is a good note; can you create an issue to track this or let me know and I will create one? To me this comment boils down to "a better lowering is possible" and that seems like a good thing to remember to do.

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown created PR review comment:

Maybe this English version helps?

        This instruction will permute the 8-bit lanes of `x` with the indices
        specified in `y`. Each lane in the mask, `y`, uses the bottom four
        bits for selecting the lane from `x` unless the most significant bit
        is set, in which case the lane is zeroed. The output vector will have
        the following contents when the element of `y` is in these ranges:

        * `[0, 127]` -> `x[y[i] % 16]`
        * `[128, 255]` -> 0

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown created PR review comment:

Should this instead be an AND condition (same below)?
                if environ.relaxed_simd_deterministic() && environ.has_native_fma() {
If someone turned off the relaxed SIMD feature but then their system had FMA available, they might get unexpectedly-rounded results? By turning off the relaxed SIMD feature, aren't they explicitly saying they want the double rounding?

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown created PR review comment:

Maybe? (same below):

                if !environ.relaxed_simd_deterministic() || !environ.is_x86() {

Wasmtime GitHub notifications bot (Mar 02 2023 at 00:41):

abrown created PR review comment:

I now see the copied-in WAST files down below and it might be good to track this task in an issue so someone else can pick it up once the testsuite is updated. Have you created a PR to upstreamed the changes that were necessary here to the relaxed SIMD testsuite?

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:07):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:12):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:12):

alexcrichton created PR review comment:

This is currently intentional because I believe the "deterministic" sematnics for fma are the fused semantics, which the fma CLIF instruction guarantees. It's only the case where determinism isn't required and fma isn't available that the otherwise fastest lowering of two operations is used.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:16):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:16):

alexcrichton created PR review comment:

Good idea, I opened https://github.com/bytecodealliance/wasmtime/issues/5914 for this. I created https://github.com/WebAssembly/testsuite/pull/65 but it sat for awhile with no response and is now outdated due to upstream changes, and I'm not sure who, if anyone, is watching that repository.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:17):

alexcrichton created PR review comment:

This is currently intentional where if determinism is required it takes the first branch which has the deterministic lowering in terms of CLIF instruction semantics. The branch isn't taken though for x86 when speed is desired b/c x86 has a faster lowering.

If you have a suggestion though to restructure this to more obviously communicate this intent it's probably good to implement though since the current if is pretty subtle and I think easy to lose track of.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:17):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 02 2023 at 19:17):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 03 2023 at 17:23):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2023 at 17:23):

abrown created PR review comment:

Ah, ok... yeah, these conditions are tricky to think through.

Wasmtime GitHub notifications bot (Mar 03 2023 at 17:28):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2023 at 17:28):

abrown created PR review comment:

Ok, makes sense. In terms of communication, I guess the missing piece of information for me was that I couldn't remember off the top of my head which lowering had the deterministic semantics. Now, obviously, I should have trusted that the if environ.relaxed_simd_deterministic() side of the branch was the deterministic lowering but in this review I was starting to wonder which lowering was which. This may just be a review issue, but if you wanted to make it extra clear the deterministic lowering could get a comment like: // This is the deterministic lowering for ..., see https://github.com/WebAssembly/relaxed-simd/....

Wasmtime GitHub notifications bot (Mar 06 2023 at 15:17):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 06 2023 at 15:18):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 06 2023 at 15:18):

alexcrichton created PR review comment:

I think the spec isn't quite in the state where there's a URL to link to unfortunately. I don't think that the deterministic semantics are actually written down anywhere other than the spec draft currently. I'll try to remember to update things here though once that's update.

Otherwise though I went ahead and added comments to all the deterministic cases at least english-explaining what the intention was.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:23):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:32):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:32):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:32):

jameysharp created PR review comment:

I think I'd prefer not to have a default implementation of has_native_fma, or it should default to false. That seems to me like something that a backend needs to opt in to, not opt out of.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:32):

jameysharp created PR review comment:

This comment seems wrong since only one possible outcome is asserted.

Wasmtime GitHub notifications bot (Mar 06 2023 at 22:32):

jameysharp created PR review comment:

I'm not sure what this sentence was supposed to say. Maybe delete "the fastest", or replace "this difference in" with something else?

Wasmtime GitHub notifications bot (Mar 07 2023 at 15:19):

alexcrichton updated PR #5892 from relaxed-simd-maybe to main.

Wasmtime GitHub notifications bot (Mar 07 2023 at 15:20):

alexcrichton has enabled auto merge for PR #5892.

Wasmtime GitHub notifications bot (Mar 07 2023 at 15:20):

alexcrichton created PR review comment:

Ah this is just copied from the upstream spec which I think is just a copy/paste artifact, but if you'd like I think they're pretty accepting of PRs.

Wasmtime GitHub notifications bot (Mar 07 2023 at 15:20):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2023 at 16:33):

alexcrichton merged PR #5892.

Last updated: Apr 17 2025 at 14:03 UTC