abrown opened PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown has marked PR #1759 as ready for review.
abrown requested bnjbvr for a review on PR #1759.
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: each
bnjbvr created PR Review Comment:
nit: double "the" here
bnjbvr created PR Review Comment:
Could this not be expressed as a standard legalization, with an ISA predicate on these two flags, in the meta-language? Then, the SSE2 expansion would just be put thereafter, so that isel would use the AVX legalization if it's there, or the SSE2 otherwise.
bnjbvr created PR Review Comment:
If this comment is right, could we safely put an
unreachable!
statement here? (Otherwise, this custom legalization will be silently triggered and do nothing for other imul variants, resulting in correctness issues, if i remember correctly)
bnjbvr created PR Review Comment:
It would be nice to comment a bit how this works, especially since pmuludq does a multiplication of the first and third double-word in each operand and puts the result in the first and second quadwords, if I understand correctly Intel's documentation.
abrown submitted PR Review.
abrown created PR Review Comment:
I don't see how this is possible with the current API in
xform.rs
. If we added a way to guard legalizations with predicates then, yes, but it looks to me like this doesn't exist yet.
lars-t-hansen submitted PR Review.
lars-t-hansen created PR Review Comment:
This doesn't look quite right to me but it may be possible I'm misunderstanding because I don't know the Cranelift IR. The result of addhigh looks right, it looks like it has the results for the high dwords of the result qwords in its low two dwords, ie the upper two dwords are garbage. But the value of 'high' isn't right I think, at least I can't make either psllq or pslldq work here - some kind of shuffle might work. The computation of low computes the two low dwords of the result qwords correctly in the two low dwords, but simply adding them in won't work even if high was correct, they again need to be shuffled into their correct places. Instead of shift and add it looks like an unpack might work (this is untested):
// lhsDest = D C B A // rhs = H G F E mov(lhsDest, high0); // high0 = D C B A psrlq(32, high0); // high0 = 0 D 0 B pmuludq(rhs, high0); // high0 = 0 D CG BE mov(rhs, high1); // high1 = H G F E psrlq(32, high1); // high1 = 0 H 0 F pmuludq(lhsDest, high1); // high1 = 0 H CH AF paddq(high0, high1); // high1 = 0 D+H CG+CH BE+AF pmuludq(rhs, lhsDest); // lhsDest = D C CG AE punpckldq(high1, lhsDest); // lhsDest = CG+CH CG BE+AF AE
lars-t-hansen edited PR Review Comment.
lars-t-hansen deleted PR Review Comment.
lars-t-hansen submitted PR Review.
lars-t-hansen created PR Review Comment:
Benjamin asked me to take a look. Initially I had a comment here where I indicated I thought this was wrong, but I see I misunderstood something, and I now think this looks right to me.
bnjbvr created PR Review Comment:
I thought about encodings (which can be guarded against isa predicates), so this code is correct.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown updated PR #1759 from i64x2-mul
to master
:
The
convert_i64x2_imul
custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizesimul.i64x2
to anx86_pmullq
in this case; if not, it uses a lengthy SSE2-compatible instruction sequence. For this logic to work, we need:
- the AVX512 instruction to be defined as a separate Cranelift instruction,
x86_pmullq
(this additional instruction would go away in the new backend)- a mechanism for accessing the x86
TargetIsa
so that we can inspect its flags during legalization- a new SSE2 instruction,
x86_pmuludq
for implementing the SSE2-compatible instruction sequence
abrown merged PR #1759.
Last updated: Nov 22 2024 at 16:03 UTC