Stream: git-wasmtime

Topic: wasmtime / PR #2905 x64: implement vselect with variable ...


view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 03:12):

abrown opened PR #2905 from fix-2754 to main:

This change implements vselect using SSE4.1's BLENDVPS, BLENDVPD, and PBLENDVB to fix #2754. vselect is a lane-selecting instruction that is used by simple_preopt.rs to lower bitselect to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than bitselect in general, which lowers to 4-5 instructions. The old backend had the vselect lowering; this simply introduces it to the new backend.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 15:49):

abrown requested alexcrichton for a review on PR #2905.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 17:16):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 17:16):

alexcrichton created PR review comment:

How does regalloc know that xmm0 is live until the xmm_rm_r instruction below?

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 17:16):

alexcrichton created PR review comment:

It looks like this requires SSE 4.1, but how does the code generator know if that's enabled or not? Should this assert that SSE 4.1 is enabled?

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 17:16):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 17:16):

alexcrichton created PR review comment:

I was curious after seeing two of these get mapped to the same instruction, while the above two types have different instructions. Reading the description, couldn't pblendvb be used for all 4 types instead of just these two smaller ones?

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 20:55):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 20:55):

abrown created PR review comment:

@cfallin can correct me if I get this wrong but every generated instruction tells regalloc how it is using each register using the RegUsageMapper trait. Well, maybe it is the RegUsageCollector. In any case, regalloc should know that a move to that virtual register (hard-coded to XMM0) is a def and that that register should be left alone until it is used.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:00):

abrown created PR review comment:

We haven't been feature-checking SSE 4.1 instructions in the lowerings since this is a Wasm SIMD baseline (though IIRC Julian would have liked us to have several tiers of lowerings, e.g., SSE2 -> SSE 4.2 -> ...). Optional lowerings are feature-checked, e.g. the AVX512 instructions I used. But all of that applies only to the lowering phase. There's another check right before the instructions are actually emitted that will panic if the target does not support this instruction.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:00):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:23):

cfallin created PR review comment:

Indeed, the regalloc.rs semantics are such that we compute live-ranges for both real regs and virtual regs; and a real reg is reserved for the extent of its virtual ranges. As long as we properly have xmm0 has a use below the move then everything works as expected!

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:24):

cfallin edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:32):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:32):

alexcrichton created PR review comment:

Yeah it makes sense to me that it's use/def based and this is a def of xmm0, but I wasn't able to find the use of xmm0 (I'm probably missing something though?) This looks like the relevant block but it doesn't seem have anything xmm0-related there, though?

(I'd sort of expect a match on the opcode to add the use of xmm0 in some cases, like the ones used below)

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:33):

alexcrichton created PR review comment:

Ah ok, so that just means that this'll panic later on if sse4.1 is disabled, and the fix would be to update the lowering here? That sounds reasonable to me!

view this post on Zulip Wasmtime GitHub notifications bot (May 14 2021 at 21:33):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:16):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:16):

abrown created PR review comment:

This is purely a preference thing like I've done previously on moves, e.g.: it adds a negligible overhead to compilation but it matches the original intent of the instruction more closely, which I feel helps during debugging and sometimes can help with latency/throughput. In this case, the latency/throughput differences are very minor (see IA Optimization Manual, appendix D; depending on the arch family, 1 in some cases and 2 in others).

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:18):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:18):

abrown created PR review comment:

Yup, actually you're right. In talking more to @cfallin I actually came to the same conclusion. Let me fix that.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:24):

abrown updated PR #2905 from fix-2754 to main.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:32):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:32):

alexcrichton created PR review comment:

Out of curiosity, what's the failure mode for a situation like this? The register allocator sees a "def" of xmm0 but it never sees a use. Does that mean that the live range is considered infinite? Or is the live range immediately "dead" after the def?

(this seems like a worrying thing to me and easy to overlook, so curious if we could bolster up checks one way or another to prevent this from happening again)

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:33):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:33):

alexcrichton created PR review comment:

Ah ok makes sense! Presumably there's no dedicated instruction for 16-bit types?

Could you leave a comment for how pblendvb would be correct for all of these but more specialized versions are used for slight optimizations?

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:33):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:39):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:39):

abrown created PR review comment:

@cfallin? :grinning_face_with_smiling_eyes:

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:40):

abrown created PR review comment:

Yup, no 16-bit, at least in SSE-land (VPBLENDM* exists up in AVX512). Yeah, I'll add the comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:40):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 16:46):

abrown updated PR #2905 from fix-2754 to main.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 17:46):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 17:46):

cfallin created PR review comment:

Yeah, the failure mode isn't great: the real-reg liverange ends at the last mention (the def) and so the register is liable to be reused for something else, clobbering the value.

Always looking for ways to foolproof this; it's a bit tricky as the question is: what is the ground-truth we can check against? If we forget to mention a register-use in the metadata we provide to the allocator, the only other way of seeing that would be to disassemble the resulting machine code independently and check its register-mentions. We could certainly do that, perhaps as another oracle during fuzzing; but it's a big project (correlating the disassembly to the VCode especially with multiple-instruction lowerings would take some care).

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 17:54):

alexcrichton created PR review comment:

Oh nah I wouldn't want to go so all-out just yet. I would imagine something much more simple which is that if something is defined and never used, it's considered either live forever or invalid, causing a panic. In the "live forever" case we'd in theory one day ask why our code was so slow and fix this by limiting the live range, and in the latter case we'd catch the panic real fast and fix it.

I'm not sure if it's common, though, for values to be defined and never used. If that happens pretty normally then there's probably not much we can do about this other than being vigilant for now.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 17:54):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 18:23):

abrown merged PR #2905.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 18:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2021 at 18:30):

cfallin created PR review comment:

Interesting thoughts in any case, thanks for the ideas here!

Keeping a real-reg liverange open forever would ensure safety here but could potentially have forward-progress implications: it's essentially a "leak" of one register, and if we reserve too many real regs in this way, then we could hit a point where we don't have enough registers to satisfy constraints and finish allocation. That's less bad than clobbering but still bad, and would be somewhat hard to debug.

In theory, dead defs shouldn't happen at the CLIF level, because we lower based on demand, though it's possible at the VCode level depending on the lowering pattern. There's also the issue that we use dead defs as a way to encode clobbers (at calls, for example), so we'd need a different category for those, but it could be doable.

We can think more about this -- I'll add the disallow-dead-defs idea to my running list of "things to look at next time I focus on fuzzing infra" :-)


Last updated: Nov 22 2024 at 16:03 UTC