Stream: git-wasmtime

Topic: wasmtime / PR #5841 x64: Enable load-coalescing for SSE/A...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 20 2023 at 16:20):

alexcrichton opened PR #5841 from avx-unalign to main:

This commit unlocks the ability to fold loads into operands of SSE and AVX instructions. This is beneficial for both function size when it happens in addition to being able to reduce register pressure. Previously this was not done because most SSE instructions require memory to be aligned. AVX instructions, however, do not have alignment requirements.

The solution implemented here is one recommended by Chris which is to add a new XmmMemAligned newtype wrapper around XmmMem. All SSE instructions are now annotated as requiring an XmmMemAligned operand except for a new new instruction styles used specifically for instructions that don't require alignment (e.g. movdqu, *sd, and *ss instructions). All existing instruction helpers continue to take XmmMem, however. This way if an AVX lowering is chosen it can be used as-is. If an SSE lowering is chosen, however, then an automatic conversion from XmmMem to XmmMemAligned kicks in. This automatic conversion only fails for unaligned addresses in which case a load instruction is emitted and the operand becomes a temporary register instead. A number of prior Xmm arguments have now been converted to XmmMem as well.

One change from this commit is that loading an unaligned operand for an SSE instruction previously would use the "correct type" of load, e.g. movups for f32x4 or movup for f64x2, but now the loading happens in a context without type information so the movdqu instruction is generated. According to [this stack overflow question][question] it looks like modern processors won't penalize this "wrong" choice of type when the operand is then used for f32 or f64 oriented instructions.

Finally this commit improves some reuse of logic in the put_in_*_mem* helper to share code with sinkable_load and avoid duplication. With this in place some various ISLE rules have been updated as well.

In the tests it can be seen that AVX-instructions are now automatically load-coalesced and use memory operands in a few cases.

[question]: https://stackoverflow.com/questions/40854819/is-there-any-situation-where-using-movdqu-and-movupd-is-better-than-movups

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Feb 20 2023 at 16:51):

alexcrichton updated PR #5841 from avx-unalign to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 20 2023 at 17:25):

alexcrichton updated PR #5841 from avx-unalign to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:34):

cfallin created PR review comment:

Noting here also a comment I left below where this is designated an implicit conversion: it definitely looks like something else at first sight (my comment on my first pass through this PR here was "can we call this assert_xmm_mem_aligned to make it apparent what this conversion is doing?"). It implicitly emits a load if the arg is actually in memory, which is why it can trivially guarantee alignment of any returned memory op (because it never returns a memory op). We should leave a comment about this I think.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:34):

cfallin created PR review comment:

Here we should have a comment I think -- it looks wrong at first sight, as in general we shouldn't be able to take an unaligned thing and coerce it to aligned. The key is that the constructor actually emits a load and returns a register-kind value instead, so we (trivially) avoid unalignment by just never carrying through a memory operand.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:34):

cfallin created PR review comment:

No aligned:true on the imm variant?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:51):

alexcrichton updated PR #5841 from avx-unalign to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:52):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:52):

alexcrichton created PR review comment:

I... don't know how the macro previously compiled. It refers to the $aligned variable which was in an entirely different repetition clause and somehow worked? Anyway added!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:53):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:53):

alexcrichton created PR review comment:

Good point! I've updated the comment here and on the (convert ...) items added too.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 18:57):

cfallin has enabled auto merge for PR #5841.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 21 2023 at 20:30):

cfallin merged PR #5841.


Last updated: Oct 23 2024 at 20:03 UTC