Stream: git-wasmtime

Topic: wasmtime / PR #1759 Legalize imul.i64x2 for both AVX and ...


view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 16:55):

abrown opened PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 16:58):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 17:11):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 17:47):

abrown has marked PR #1759 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 17:48):

abrown requested bnjbvr for a review on PR #1759.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 17:55):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2020 at 18:40):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 15:58):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2020 at 16:00):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2020 at 03:14):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr created PR Review Comment:

nit: each

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr created PR Review Comment:

nit: double "the" here

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr created PR Review Comment:

Could this not be expressed as a standard legalization, with an ISA predicate on these two flags, in the meta-language? Then, the SSE2 expansion would just be put thereafter, so that isel would use the AVX legalization if it's there, or the SSE2 otherwise.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr created PR Review Comment:

If this comment is right, could we safely put an unreachable! statement here? (Otherwise, this custom legalization will be silently triggered and do nothing for other imul variants, resulting in correctness issues, if i remember correctly)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 16:25):

bnjbvr created PR Review Comment:

It would be nice to comment a bit how this works, especially since pmuludq does a multiplication of the first and third double-word in each operand and puts the result in the first and second quadwords, if I understand correctly Intel's documentation.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 19:49):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2020 at 19:49):

abrown created PR Review Comment:

I don't see how this is possible with the current API in xform.rs. If we added a way to guard legalizations with predicates then, yes, but it looks to me like this doesn't exist yet.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 07:39):

lars-t-hansen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 07:39):

lars-t-hansen created PR Review Comment:

This doesn't look quite right to me but it may be possible I'm misunderstanding because I don't know the Cranelift IR. The result of addhigh looks right, it looks like it has the results for the high dwords of the result qwords in its low two dwords, ie the upper two dwords are garbage. But the value of 'high' isn't right I think, at least I can't make either psllq or pslldq work here - some kind of shuffle might work. The computation of low computes the two low dwords of the result qwords correctly in the two low dwords, but simply adding them in won't work even if high was correct, they again need to be shuffled into their correct places. Instead of shift and add it looks like an unpack might work (this is untested):

  // lhsDest = D C B A
  // rhs = H G F E
  mov(lhsDest, high0);       // high0 = D C B A
  psrlq(32, high0);          // high0 = 0 D 0 B
  pmuludq(rhs, high0);       // high0 = 0 D CG BE
  mov(rhs, high1);           // high1 = H G F E
  psrlq(32, high1);          // high1 = 0 H 0 F
  pmuludq(lhsDest, high1);   // high1 = 0 H CH AF
  paddq(high0, high1);       // high1 = 0 D+H CG+CH BE+AF
  pmuludq(rhs, lhsDest);     // lhsDest = D C CG AE
  punpckldq(high1, lhsDest); // lhsDest = CG+CH CG BE+AF AE

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 07:47):

lars-t-hansen edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 07:59):

lars-t-hansen deleted PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 08:02):

lars-t-hansen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 08:02):

lars-t-hansen created PR Review Comment:

Benjamin asked me to take a look. Initially I had a comment here where I indicated I thought this was wrong, but I see I misunderstood something, and I now think this looks right to me.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 14:05):

bnjbvr created PR Review Comment:

I thought about encodings (which can be guarded against isa predicates), so this code is correct.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 14:05):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 14:08):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 16:27):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 22:13):

abrown updated PR #1759 from i64x2-mul to master:

The convert_i64x2_imul custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes imul.i64x2 to an x86_pmullq in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 03 2020 at 23:27):

abrown merged PR #1759.


Last updated: Dec 23 2024 at 12:05 UTC