abrown opened issue #3945:
#3886 ports CLIF
icmp
operations to ISLE but in the process added new, unnecessary moves to the lowering:The two initial
movdqa
instructions are not needed with the proper use of registers; my current thinking is that ISLE's move elision may not be working as it should but this could be due to re-ordering of operands in a lowering rule that ISLE cannot resolve.
abrown commented on issue #3945:
This may be similar to #3744.
alexcrichton labeled issue #3945:
#3886 ports CLIF
icmp
operations to ISLE but in the process added new, unnecessary moves to the lowering:The two initial
movdqa
instructions are not needed with the proper use of registers; my current thinking is that ISLE's move elision may not be working as it should but this could be due to re-ordering of operands in a lowering rule that ISLE cannot resolve.
alexcrichton labeled issue #3945:
#3886 ports CLIF
icmp
operations to ISLE but in the process added new, unnecessary moves to the lowering:The two initial
movdqa
instructions are not needed with the proper use of registers; my current thinking is that ISLE's move elision may not be working as it should but this could be due to re-ordering of operands in a lowering rule that ISLE cannot resolve.
alexcrichton labeled issue #3945:
#3886 ports CLIF
icmp
operations to ISLE but in the process added new, unnecessary moves to the lowering:The two initial
movdqa
instructions are not needed with the proper use of registers; my current thinking is that ISLE's move elision may not be working as it should but this could be due to re-ordering of operands in a lowering rule that ISLE cannot resolve.
alexcrichton commented on issue #3945:
With the new regalloc the current test looks like:
I'm not familiar enough with these instructions to know whether this is fixed though (there's still a single movdqa)
abrown commented on issue #3945:
I mean... this just looks wrong: I can never quite remember Cranelift's order of operands when using the x64 backend but our use of
%
makes this look like AT&T syntax (so destination on the right).%xmm0
and%xmm1
are probably used for ABI arguments so it looks like we copy%xmm0
to%xmm5
in order to use the value twice--so far so good. But then the problem: I think what we need to do is to a) find the maximum between%xmm5
and%xmm1
and put that into%xmm5
and b) equality-compare%xmm0
and%xmm5
and put that into%xmm0
for the return. but the last two instructions there don't quite say that! Looks like it should be:pmaxsw %xmm5, %xmm1, %xmm5 pcmpeqw %xmm0, %xmm5, %xmm0
But even this is a bit silly because those SIMD instructions aren't really being used in their three-operand form.
@alexcrichton, I think this issue is fixed though :grinning_face_with_smiling_eyes:. The extra move is gone but we should resolve this debug output question...
abrown edited a comment on issue #3945:
I mean... this just looks wrong: I can never quite remember Cranelift's order of operands when using the x64 backend but our use of
%
makes this look like AT&T syntax (i.e., destination on the right).%xmm0
and%xmm1
are probably used for ABI arguments so it looks like we copy%xmm0
to%xmm5
in order to use the value twice--so far so good. But then the problem: I think what we need to do is to a) find the maximum between%xmm5
and%xmm1
and put that into%xmm5
and b) equality-compare%xmm0
and%xmm5
and put that into%xmm0
for the return. but the last two instructions there don't quite say that! Looks like it should be:pmaxsw %xmm5, %xmm1, %xmm5 pcmpeqw %xmm0, %xmm5, %xmm0
But even this is a bit silly because those SIMD instructions aren't really being used in their three-operand form.
@alexcrichton, I think this issue is fixed though :grinning_face_with_smiling_eyes:. The extra move is gone but we should resolve this debug output question...
cfallin commented on issue #3945:
@abrown the output of
clif-util compile
on the function is:Disassembly of 21 bytes: 0: 55 push rbp 1: 48 89 e5 mov rbp, rsp 4: 66 0f 6f e8 movdqa xmm5, xmm0 8: 66 0f ee e9 pmaxsw xmm5, xmm1 c: 66 0f 75 c5 pcmpeqw xmm0, xmm5 10: 48 89 ec mov rsp, rbp 13: 5d pop rbp 14: c3 ret
so this looks right in actuality (result is put in
xmm0
). I think it's a pretty-printing bug (argument swap) actually -- PR incoming to fix that.The extra move is indeed elided now, so I think we can close this particular issue.
cfallin closed issue #3945:
#3886 ports CLIF
icmp
operations to ISLE but in the process added new, unnecessary moves to the lowering:The two initial
movdqa
instructions are not needed with the proper use of registers; my current thinking is that ISLE's move elision may not be working as it should but this could be due to re-ordering of operands in a lowering rule that ISLE cannot resolve.
Last updated: Dec 23 2024 at 12:05 UTC