cfallin opened Issue #2545:
From a SIMD debugging adventure with @abrown, we discovered two related issues:
The x64 backend's movss and movsd instructions currently allow a reg/reg form where the source register is a GPR (e.g., rax). This is an illegal combination that the hardware does not support, and worse, we silently convert the index of the GPR (e.g., rax -> 0) into the index of an XMM register (e.g., xmm0), leading to incorrect machine code. We should add an assert in emission or in the instruction constructor (or both) to ensure we don't emit this combination.
The register use/def/mod info provided to regalloc for movss/movsd is not correct in all cases. These instructions' xmmreg, xmmreg form do an "insert lane" operation that depends on the destination's old value. We sometimes use them just as narrow moves, and sometimes as insertlane operators. We need to ensure we report a def (not mod) in the former case, and a mod (not def) in the latter; we might see regalloc verification errors or (worse) missing reloads if we get this wrong in either direction.
A possibly simpler fix to the second issue is to always use movss/movsd R-R form as an insertlane, and always report it as such, and just use full-width XMM register move instructions for normal reg-to-reg moves. I'm not sure if this has any performance implications (@abrown could say for sure?) but it would certainly simplify our backend and make reasoning about correctness a little easier.
bnjbvr labeled Issue #2545:
From a SIMD debugging adventure with @abrown, we discovered two related issues:
The x64 backend's movss and movsd instructions currently allow a reg/reg form where the source register is a GPR (e.g., rax). This is an illegal combination that the hardware does not support, and worse, we silently convert the index of the GPR (e.g., rax -> 0) into the index of an XMM register (e.g., xmm0), leading to incorrect machine code. We should add an assert in emission or in the instruction constructor (or both) to ensure we don't emit this combination.
The register use/def/mod info provided to regalloc for movss/movsd is not correct in all cases. These instructions' xmmreg, xmmreg form do an "insert lane" operation that depends on the destination's old value. We sometimes use them just as narrow moves, and sometimes as insertlane operators. We need to ensure we report a def (not mod) in the former case, and a mod (not def) in the latter; we might see regalloc verification errors or (worse) missing reloads if we get this wrong in either direction.
A possibly simpler fix to the second issue is to always use movss/movsd R-R form as an insertlane, and always report it as such, and just use full-width XMM register move instructions for normal reg-to-reg moves. I'm not sure if this has any performance implications (@abrown could say for sure?) but it would certainly simplify our backend and make reasoning about correctness a little easier.
bnjbvr labeled Issue #2545:
From a SIMD debugging adventure with @abrown, we discovered two related issues:
The x64 backend's movss and movsd instructions currently allow a reg/reg form where the source register is a GPR (e.g., rax). This is an illegal combination that the hardware does not support, and worse, we silently convert the index of the GPR (e.g., rax -> 0) into the index of an XMM register (e.g., xmm0), leading to incorrect machine code. We should add an assert in emission or in the instruction constructor (or both) to ensure we don't emit this combination.
The register use/def/mod info provided to regalloc for movss/movsd is not correct in all cases. These instructions' xmmreg, xmmreg form do an "insert lane" operation that depends on the destination's old value. We sometimes use them just as narrow moves, and sometimes as insertlane operators. We need to ensure we report a def (not mod) in the former case, and a mod (not def) in the latter; we might see regalloc verification errors or (worse) missing reloads if we get this wrong in either direction.
A possibly simpler fix to the second issue is to always use movss/movsd R-R form as an insertlane, and always report it as such, and just use full-width XMM register move instructions for normal reg-to-reg moves. I'm not sure if this has any performance implications (@abrown could say for sure?) but it would certainly simplify our backend and make reasoning about correctness a little easier.
Last updated: Jan 24 2025 at 00:11 UTC