wasmtime / PR #5795 x64: Add rudimentary support for some... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #5795 x64: Add rudimentary support for some...

Wasmtime GitHub notifications bot (Feb 16 2023 at 00:12):

alexcrichton opened PR #5795 from better-avx-supoprt to main:

I was poking around Spidermonkey's wasm backend and saw that the various assembler functions used are all v*-prefixed which look like they're intended for use with AVX instructions. I looked at Cranelift and it currently doesn't have support for many AVX-based instructions, so I figured I'd take a crack at it!

The support added here is a bit of a mishmash when viewed alone, but my general goal was to take a single instruction from the SIMD proposal for WebAssembly and migrate all of its component instructions to AVX. I, by random chance, picked a pretty complicated instruction of f32x4.min. This wasm instruction is implemented on x64 with 4 unique SSE instructions and ended up being a pretty good candidate.

Further digging about AVX-vs-SSE shows that there should be two major benefits to using AVX over SSE:

Primarily AVX instructions largely use a three-operand form where two input registers are operated with and an output register is also specified. This is in contrast to SSE's predominant one-register-is-input-but-also-output pattern. This should help free up the register allocator a bit and additionally remove the need for movement between registers.

As #4767 notes the memory-based operations of VEX-encoded instructions (aka AVX instructions) do not have strict alignment requirements which means we would be able to sink loads and stores into individual instructions instead of having separate instructions.

So I set out on my journey to implement the instructions used by f32x4.min. The first few were fairly easy. The machinst backends are already of the shape "take these inputs and compute the output" where the x86 requirement of a register being both input and output is postprocessed in. This means that the inst.isle creation helpers for SSE instructions were already of the correct form to use AVX. I chose to add new rule branches for the instruction creation helpers, for example x64_andnps. The new rule conditionally only runs if AVX is enabled and emits an AVX instruction instead of an SSE instruction for achieving the same goal. This means that no lowerings of clif instructions were modified, instead just new instructions are being generated.

The VEX encoding was previously not heavily used in Cranelift. The only current user are the FMA-style instructions that Cranelift has at this time. These FMA instructions have one extra operand than vandnps, for example, so I split the existing XmmRmRVex into a few more variants to fit the shape of the instructions that needed generating for f32x4.min. This was accompanied then with more AVX opcode definitions, more emission support, etc.

Upon implementing all of this it turned out that the test suite was failing on my machine due to the memory-operand encodings of VEX instructions not being supported. I didn't explicitly add those in myself but some preexisting RIP-relative addressing was leaking into the new instructions with existing tests. I opted to go ahead and fill out the memory addressing modes of VEX encoding to get the tests passing again.

All-in-all this PR adds new instructions to the x64 backend for a number of AVX instructions, updates 5 existing instruction producers to use AVX instructions conditionally, implements VEX memory operands, and adds some simple tests for the new output of f32x4.min. The existing runtest for f32x.min caught a few intermediate bugs along the way and I additionally added a plain target x86_64 to that runtest to ensure that it executes with and without AVX to test the various lowerings. I'll also note that this, and future support, should be well-fuzzed through Wasmtime's fuzzing which may explicitly disable AVX support despite the machine having access to AVX, so non-AVX lowerings should be well-tested into the future.

It's also worth mentioning that I am not an AVX or VEX or x64 expert. Implementing the memory operand part for VEX was the hardest part of this PR and while I think it should be good someone else should definitely double-check me. Additionally I haven't added many instructions to the x64 backend yet so I may have missed obvious places to tests or such, so am happy to follow-up with anything to be more thorough if necessary.

Finally I should note that this is just the tip of the iceberg when it comes to AVX. My hope is to get some of the idioms sorted out to make it easier for future PRs to add one-off instruction lowerings or such.

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:09):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:09):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:09):

afonso360 created PR review comment:

This (and the other fma) decl's can be updated now that we support XmmMem right?

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:24):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:24):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 07:24):

afonso360 created PR review comment:

I would have expected that in this case the load could have been merged into vorps, any idea why that didn't happen?

Wasmtime GitHub notifications bot (Feb 16 2023 at 11:30):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 11:30):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 11:30):

afonso360 created PR review comment:

;; Helper for creating `MInst.XmmRmiRVex` instructions.

Wasmtime GitHub notifications bot (Feb 16 2023 at 11:30):

afonso360 created PR review comment:

I don't think we can update this to XmmMem since minps requires alignment, but its unfortunate since it also prevents loads sinking into the AVX version.

Is there another way around it? I suspect we are going to run into more cases like this in the future.

My thoughts would be to do something like:
(decl load_me_maybe (XmmMem) Xmm)

(decl x64_minps (Xmm XmmMem) Xmm)
(rule 0 (x64_minps x y)
      (xmm_rm_r (SseOpcode.Minps) x (load_me_maybe y)))
But I'm not sure how good an idea that is, and we definitely don't need to do it in this PR.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:06):

afonso360 created PR review comment:

It looks like for loads to be sunk we need a bunch of additional stuff. Which for some reason I had the idea wasn't necessary before.

https://github.com/bytecodealliance/wasmtime/blob/d30ce3192b416010fe3d6fe3a573cad743b7a2b7/cranelift/codegen/src/isa/x64/lower.isle#L3607-L3610

So that probably explains why this particular case didn't get sunk. bor doesen't have a special case for sinking loads for floats (but does have it for integers).

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:06):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:23):

alexcrichton updated PR #5795 from better-avx-supoprt to main.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:27):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:27):

alexcrichton created PR review comment:

It should yeah! As you've seen though I think more intrusive handling is going to be required to get fused memory loads/stores into these operations as well. For now I think it might be best to defer switching everything over to XmmMem for a future PR to holistically try to handle everything in one go. The main reason I added memory encodings for VEX instructions was just because a test case ended up requiring it in Wasmtime which I wasn't expecting!

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:28):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:28):

alexcrichton created PR review comment:

Ah yeah I think we'll have to add all the sinkable_load business to the rules for all the lowerings here. That shouldn't be too too bad as it's basically the same as all the other arithmetic instructions.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:29):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:29):

afonso360 created PR review comment:

Yeah I agree, It's probably better to do it all in a future PR! :+1:

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:29):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:29):

alexcrichton created PR review comment:

One idea I had was that in the sinkable_load extractor a helper function is_mergeable_load is called which currently says false for all SIMD-based loads if they aren't flagged as aligned. I was thinking that if we have complete AVX support we might be able to make that function conditional where if AVX is enabled then it allows unaligned loads but if not it disallows unaligned loads. That's still somewhat brittle, though, so I don't honestly know the best way to handle this at this time.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:31):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:31):

afonso360 created PR review comment:

Similar to the above comments, we should probably do load sinking in a separate PR since it is a bit more advanced.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:31):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Feb 16 2023 at 15:37):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Feb 16 2023 at 16:45):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 16:45):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 16:45):

fitzgen created PR review comment:

Is this an actual TODO or is that arm just unreachable given the set of AVX instructions we support right now?

Wasmtime GitHub notifications bot (Feb 16 2023 at 16:45):

fitzgen created PR review comment:

Nitpick: doc comment is slightly outdated now.

Wasmtime GitHub notifications bot (Feb 16 2023 at 16:45):

fitzgen created PR review comment:

That's still somewhat brittle, though, so I don't honestly know the best way to handle this at this time.

We could pass the opcode to is_mergeable_load as well and have an allow list for AVX instructions that don't barf on unaligned loads.

Wasmtime GitHub notifications bot (Feb 16 2023 at 17:14):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 17:14):

elliottt created PR review comment:

I like that idea! We've already got a separate path for sinking loads for sse operations (put_in_xmm_mem) which would be a great place to add this information. The only downside I can see with changing the signature of that function is that we'd no longer be able to use it as an implicit conversion. I think that might actually be a great change though, as it would force us to be a bit more thoughtful about where we're allowing loads to be sunk for sse instructions.

Wasmtime GitHub notifications bot (Feb 16 2023 at 20:01):

alexcrichton updated PR #5795 from better-avx-supoprt to main.

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:03):

alexcrichton updated PR #5795 from better-avx-supoprt to main.

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:03):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:03):

alexcrichton created PR review comment:

Ah just an unreachable arm, I've updated the panic invocation to indicate that

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:24):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:37):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2023 at 23:37):

cfallin created PR review comment:

Alternate idea, for future consideration: an instruction that takes a R/M arg but requires alignment could encode that in the type, via e.g. an XmmMemAligned arg. (If we currently have a case where we use the same MInst enum variant for an alignment-required opcode and not, we should split into separate variants.) Then if we made a pass over all instructions to ensure we got the types right, we can be much more permissive with the automatic conversions everywhere, without having to worry about continued vigilance with explicit put_in_xmm_mem, etc.

Wasmtime GitHub notifications bot (Feb 17 2023 at 02:12):

alexcrichton merged PR #5795.

Last updated: Apr 18 2025 at 04:04 UTC