cfallin opened Issue #2689:
As noted by @abrown in #2682, our 128-bit shift sequences on x86-64 could make use of
PSLLDQ
andPSRLDQ
to do the 128-bit operation in one go, rather than an open-coded combination of 64-bit shifts with conditional moves, etc. It's likely that this would be faster even with moves to the XMM register file. It's possible that there are better SSE alternatives for some of our other operations as well.
cfallin labeled Issue #2689:
As noted by @abrown in #2682, our 128-bit shift sequences on x86-64 could make use of
PSLLDQ
andPSRLDQ
to do the 128-bit operation in one go, rather than an open-coded combination of 64-bit shifts with conditional moves, etc. It's likely that this would be faster even with moves to the XMM register file. It's possible that there are better SSE alternatives for some of our other operations as well.
Last updated: Jan 24 2025 at 00:11 UTC