MaxGraey opened issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least forimm < 400
and 64-bit imul only if this possible. Similar to GCC:
bjorn3 commented on issue #4686:
for non-power of two constants at least for imm < 400
Should this check for a low hamming weight rather than a max value?
MaxGraey commented on issue #4686:
low hamming weight rather than a max value?
Yeah, perhaps this will be better
MaxGraey edited issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
akirilov-arm labeled issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
akirilov-arm labeled issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
akirilov-arm labeled issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
akirilov-arm labeled issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
akirilov-arm labeled issue #4686:
Feature
Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes
3 cycle
latency,1c throughput
which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constantsat least forwith low hamming weight and 64-bit imul only if this possible. Similar to GCC:imm < 400
Last updated: Nov 22 2024 at 16:03 UTC