Stream: git-wasmtime

Topic: wasmtime / issue #4686 [cranelift] Avoid 64-bit imul_imm ...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 08:50):

MaxGraey opened issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 09:33):

bjorn3 commented on issue #4686:

for non-power of two constants at least for imm < 400

Should this check for a low hamming weight rather than a max value?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 09:34):

MaxGraey commented on issue #4686:

low hamming weight rather than a max value?

Yeah, perhaps this will be better

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 09:38):

MaxGraey edited issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2022 at 15:41):

akirilov-arm labeled issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2022 at 15:41):

akirilov-arm labeled issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2022 at 15:41):

akirilov-arm labeled issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2022 at 15:41):

akirilov-arm labeled issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v

view this post on Zulip Wasmtime GitHub notifications bot (Sep 02 2022 at 15:41):

akirilov-arm labeled issue #4686:

Feature

Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combination. So I propose use lowering to lea + shl / add for non-power of two constants at least for imm < 400 with low hamming weight and 64-bit imul only if this possible. Similar to GCC:

https://godbolt.org/z/aG7bPer9v


Last updated: Nov 22 2024 at 16:03 UTC