wasmtime / issue #5892 Implement the relaxed SIMD proposal · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5892 Implement the relaxed SIMD proposal

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:30):

This is a substantial bit of work (regardless of the "small" proposal -- thanks for this!

I have some thoughts on the Cranelift side below. To preface these thoughts: this is a good change, and we need to support this proposal. However, determinism has been a significant discussion point in CLIF and has significant implications for testing and verification, so I want to suggest that we might consider some tweaks to the approach. So:

Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with arch_*. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.

Skimming the added instructions' definitions, my main concern is basically a mirror of the determinism concern at the Wasm standards discussion level. In CLIF, we've had a lot of discussion around determism as it relates to endianness (#3369 and subsequent discussions and PRs), which became especially relevant as we built the CLIF interpreter and started to fuzz against it. If we let CLIF semantics be partly defined by architecture-specific details, then we can't possibly define an interpreter behavior that matches all backends. Our solution in the endianness case was to reify endian as an explicit detail in the IR, let loads/stores be tagged as such. Ultimately in that case one can support big+little endian on all machines with the help of bswaps (and s390x actually does, so Wasm lowerings work), but that's a separate and additional step no different from any other "support operand combinations" enhancement.

This is important for verification efforts too: we shouldn't have CLIF whose meaning depends on the compilation target. We could adopt a refinement semantics approach ("result may be one of the values in this set" and then compiled code refines those sets) but that's harder to reason about in general.

I think a similar approach as we had with endianness might work here: define a Cranelift setting that is something like "arch_* behavior", or more granular settings for each ("insertlane behavior", ...), set them as appropriate for the expected target, and implement the appropriate lowerings (and all lowerings in the interpreter).

There are some issues with that -- namely, we now need to plumb "expected native settings for target T" up to the CLIF producer, and the actual CLIF will now differ on x86-64 and aarch64 for programs that use these opcodes -- but the semantics are at least defined fully and machine-independently for any CLIF.

Thoughts? Also, the Cranelift meeting on Wednesday would probably be a good venue to get a wider set of thoughts on this :-)

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:32):

cfallin edited a comment on issue #5892:

This is a substantial bit of work (regardless of the "small" proposal) -- thanks for this!

I have some thoughts on the Cranelift side below. To preface these thoughts: this is a good change, and we need to support this proposal. However, determinism has been a significant discussion point in CLIF and has significant implications for testing and verification, so I want to suggest that we might consider some tweaks to the approach. So:

Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with arch_*. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.

Skimming the added instructions' definitions, my main concern is basically a mirror of the determinism concern at the Wasm standards discussion level. In CLIF, we've had a lot of discussion around determism as it relates to endianness (#3369 and subsequent discussions and PRs), which became especially relevant as we built the CLIF interpreter and started to fuzz against it. If we let CLIF semantics be partly defined by architecture-specific details, then we can't possibly define an interpreter behavior that matches all backends. Our solution in the endianness case was to reify endian as an explicit detail in the IR, and let loads/stores be tagged as such. Ultimately in that case one can support big+little endian on all machines with the help of bswaps (and s390x actually does, so Wasm lowerings work), but that's a separate and additional step no different from any other "support operand combinations" enhancement.

This is important for verification efforts too: we shouldn't have CLIF whose meaning depends on the compilation target. We could adopt a refinement semantics approach ("result may be one of the values in this set" and then compiled code refines those sets) but that's harder to reason about in general.

I think a similar approach as we had with endianness might work here: define a Cranelift setting that is something like "arch_* behavior", or more granular settings for each ("insertlane behavior", ...), set them as appropriate for the expected target, and implement the appropriate lowerings (and all lowerings in the interpreter).

There are some issues with that -- namely, we now need to plumb "expected native settings for target T" up to the CLIF producer, and the actual CLIF will now differ on x86-64 and aarch64 for programs that use these opcodes -- but the semantics are at least defined fully and machine-independently for any CLIF.

Thoughts? Also, the Cranelift meeting on Wednesday would probably be a good venue to get a wider set of thoughts on this :-)

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:35):

cfallin commented on issue #5892:

(An addendum to the above: in the endianness case we have a third endian, "native", which lets us avoid the plumbing problem of "which endian is faster / even supported" in the CLIF producer; but IMHO that's a bit of a wart, which I'd like to remove eventually for all of the reasons above, to fully capture the meaning in the IR. So I'd prefer we bite the bullet and plumb this through with settings and "machine info" of some sort.)

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:43):

jameysharp commented on issue #5892:

If we want to offer the option of using the deterministic semantics for the relaxed SIMD proposal anyway, what if we start by shipping only that version? Then we can support the new wasm instructions now, while taking some more time to think through how best to expose architecture-specific instructions in Cranelift.

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:45):

github-actions[bot] commented on issue #5892:

Subscribe to Label Action

cc @peterhuene

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "cranelift:wasm", "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:45):

github-actions[bot] commented on issue #5892:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[ ] If you added a new Config method, you wrote extensive documentation for
it.

<details>

Our documentation should be of the following form:

```text
Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

Example

Optional example here.
```

</details>

[ ] If you added a new Config method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.

<details>

For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config][fuzzing-config] (or one
of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.

</details>

[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html

<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

Wasmtime GitHub notifications bot (Feb 27 2023 at 23:54):

alexcrichton commented on issue #5892:

Definitely happy to discuss more at a Cranelift meeting! Sorry should have clarified that in the PR description, I figured the precise method of implementing this would be contentious and want to emphasize that this current iteration is just one point in the design space and I'm happy to adjust as preferred. My motivation for working on this was I wanted to add the instructions to the Rust standard library, but doing that required having a runtime that implements them, and since Wasmtime is currently used to validate simd intrinsics for Rust I figured I may as well give it a stab to implement here. In that sense I'm in no rush to get something in, and I would prefer to not land anything unless there's a path forward to getting the fast lowerings of these instructions since I suspect it's not really that useful of a proposal if only the deterministic lowerings are offered.

One alternative is to basically just implement new x86_* instructions for CLIF which aren't implemented for any backend other than x86 (unless someone really really wants to do that) and are defined in terms of the actual x86 instructions. I don't know how scalable this is though in the sense that this PR only implements AArch64 and x86 lowerings for the relaxed simd instructions and while I don't believe any aarch64_* instructions would be necessary I don't know if CLIF would need to grow riscv64_* or s390x_* instructions.

Another alternative would be to take inspiration which has intrinsics of the form @llvm.x86.* where any backend could have any number of intrinsics which are well-defined for each backend. CLIF AFAIK isn't really set up for this right now, though, but would provide more easy extensibility while retaining "everything can in theory be implemented on every platform for every other platform simultaneously".

These alternatives, as noted, do indeed lift the complexity into the producer but that's not really the end of the world if the main producer here for these instructions is Wasmtime.

Wasmtime GitHub notifications bot (Feb 28 2023 at 00:42):

cfallin commented on issue #5892:

One alternative is to basically just implement new x86_* instructions for CLIF which aren't implemented for any backend other than x86 (unless someone really really wants to do that) and are defined in terms of the actual x86 instructions. I don't know how scalable this is though in the sense that this PR only implements AArch64 and x86 lowerings for the relaxed simd instructions and while I don't believe any aarch64_* instructions would be necessary I don't know if CLIF would need to grow riscv64_* or s390x_* instructions.

I actually kind of like this too, weirdly enough: it's dead-simple, and it doesn't create instructions whose semantics are "parameterized" on a Cranelift setting. If anyone were perplexed by "x86" appearing in a machine-dependent IR, we could expand it to x86like_insertlane or whatever -- the ISA name is the inspiration for the semantics but the semantics are machine-independent once we define this.

Wasmtime GitHub notifications bot (Feb 28 2023 at 01:23):

jameysharp commented on issue #5892:

A WebAssembly module can't select which code path to use based on which wasm extensions are available in the host, right? In that case I think there is value in having any implementation at all of the relaxed SIMD instructions, even if for some reason we never get around to making them fast on x86, so we can at least run modules that use them. I also think it's good to merge the uncontroversial parts of this first while we sort out the path to making it fast, so later we're iterating on a smaller PR.

Wasmtime GitHub notifications bot (Feb 28 2023 at 02:10):

cfallin edited a comment on issue #5892:

One alternative is to basically just implement new x86_* instructions for CLIF which aren't implemented for any backend other than x86 (unless someone really really wants to do that) and are defined in terms of the actual x86 instructions. I don't know how scalable this is though in the sense that this PR only implements AArch64 and x86 lowerings for the relaxed simd instructions and while I don't believe any aarch64_* instructions would be necessary I don't know if CLIF would need to grow riscv64_* or s390x_* instructions.

I actually kind of like this too, weirdly enough: it's dead-simple, and it doesn't create instructions whose semantics are "parameterized" on a Cranelift setting. If anyone were perplexed by "x86" appearing in a machine-independent IR, we could expand it to x86like_insertlane or whatever -- the ISA name is the inspiration for the semantics but the semantics are machine-independent once we define this.

Wasmtime GitHub notifications bot (Feb 28 2023 at 23:33):

alexcrichton commented on issue #5892:

I've split out https://github.com/bytecodealliance/wasmtime/pull/5895, https://github.com/bytecodealliance/wasmtime/pull/5896, and https://github.com/bytecodealliance/wasmtime/pull/5897, rebased this PR on top of those, and then split the bulk into two commits -- the first adds all the scaffolding/support/deterministic lowerings and the second adds all the x86-specific goop. I went ahead and switched to x86_* style instructions in CLIF, but I'd still like to discuss this in tomorrow's Cranelift meeting.

Wasmtime GitHub notifications bot (Mar 06 2023 at 15:19):

alexcrichton commented on issue #5892:

@cfallin or @jameysharp did y'all want to take another pass over this? If not I can go ahead and hit the button too

Last updated: Apr 17 2025 at 11:03 UTC