afonso360 opened issue #4462:
:wave: Hey,
It looks like we don't have any lowering for the
fma
instruction when used with SIMD types.
.clif
Test Casefunction %fma_f32x4(f32x4, f32x4, f32x4) -> f32x4 { block0(v0: f32x4, v1: f32x4, v2: f32x4): v3 = fma v0, v1, v2 return v3 }
Steps to Reproduce
cargo run -- compile --target x86_64 ./the-above.clif
Expected Results
A successful compilation.
Actual Results
We don't have this implemented
thread 'main' panicked at 'internal error: entered unreachable code: implemented in ISLE: inst = `v3 = fma.f32x4 v0, v1, v2`, type = `Some(types::F32X4)`', cranelift\codegen\src\isa\x64\lower.rs:808:9 n
Versions and Environment
Cranelift version or commit: main
Operating system: Windows
Architecture:x86_64
Extra Info
In #4460 I tried to implement this using the
vfmadd231ps
instruction, but I had issues implementing emit for it and I'd like some help.It looks like that instruction is only available in VEX encoding (or EVEX for
avx512
but I don't have a machine with that). Is ourEvexInstruction
encoder suitable for emitting VEX instructions, or are they completely different? Do we have a way to emit VEX instructions?I'm not too familiar with x86 but I'd like to pick this up with some help.
cc: @abrown
afonso360 labeled issue #4462:
:wave: Hey,
It looks like we don't have any lowering for the
fma
instruction when used with SIMD types.
.clif
Test Casefunction %fma_f32x4(f32x4, f32x4, f32x4) -> f32x4 { block0(v0: f32x4, v1: f32x4, v2: f32x4): v3 = fma v0, v1, v2 return v3 }
Steps to Reproduce
cargo run -- compile --target x86_64 ./the-above.clif
Expected Results
A successful compilation.
Actual Results
We don't have this implemented
thread 'main' panicked at 'internal error: entered unreachable code: implemented in ISLE: inst = `v3 = fma.f32x4 v0, v1, v2`, type = `Some(types::F32X4)`', cranelift\codegen\src\isa\x64\lower.rs:808:9 n
Versions and Environment
Cranelift version or commit: main
Operating system: Windows
Architecture:x86_64
Extra Info
In #4460 I tried to implement this using the
vfmadd231ps
instruction, but I had issues implementing emit for it and I'd like some help.It looks like that instruction is only available in VEX encoding (or EVEX for
avx512
but I don't have a machine with that). Is ourEvexInstruction
encoder suitable for emitting VEX instructions, or are they completely different? Do we have a way to emit VEX instructions?I'm not too familiar with x86 but I'd like to pick this up with some help.
cc: @abrown
afonso360 labeled issue #4462:
:wave: Hey,
It looks like we don't have any lowering for the
fma
instruction when used with SIMD types.
.clif
Test Casefunction %fma_f32x4(f32x4, f32x4, f32x4) -> f32x4 { block0(v0: f32x4, v1: f32x4, v2: f32x4): v3 = fma v0, v1, v2 return v3 }
Steps to Reproduce
cargo run -- compile --target x86_64 ./the-above.clif
Expected Results
A successful compilation.
Actual Results
We don't have this implemented
thread 'main' panicked at 'internal error: entered unreachable code: implemented in ISLE: inst = `v3 = fma.f32x4 v0, v1, v2`, type = `Some(types::F32X4)`', cranelift\codegen\src\isa\x64\lower.rs:808:9 n
Versions and Environment
Cranelift version or commit: main
Operating system: Windows
Architecture:x86_64
Extra Info
In #4460 I tried to implement this using the
vfmadd231ps
instruction, but I had issues implementing emit for it and I'd like some help.It looks like that instruction is only available in VEX encoding (or EVEX for
avx512
but I don't have a machine with that). Is ourEvexInstruction
encoder suitable for emitting VEX instructions, or are they completely different? Do we have a way to emit VEX instructions?I'm not too familiar with x86 but I'd like to pick this up with some help.
cc: @abrown
afonso360 edited issue #4462:
:wave: Hey,
It looks like we don't have any lowering for the
fma
instruction when used with SIMD types.
.clif
Test Casefunction %fma_f32x4(f32x4, f32x4, f32x4) -> f32x4 { block0(v0: f32x4, v1: f32x4, v2: f32x4): v3 = fma v0, v1, v2 return v3 }
Steps to Reproduce
cargo run -- compile --target x86_64 ./the-above.clif
Expected Results
A successful compilation.
Actual Results
We don't have this implemented
thread 'main' panicked at 'internal error: entered unreachable code: implemented in ISLE: inst = `v3 = fma.f32x4 v0, v1, v2`, type = `Some(types::F32X4)`', cranelift\codegen\src\isa\x64\lower.rs:808:9 n
Versions and Environment
Cranelift version or commit: main
Operating system: Windows
Architecture:x86_64
Extra Info
In #4460 I tried to implement this using the
vfmadd231ps
instruction, but I had issues encoding the instruction and I'd like some help.It looks like that instruction is only available in VEX encoding (or EVEX for
avx512
but I don't have a machine with that). Is ourEvexInstruction
encoder suitable for emitting VEX instructions, or are they completely different? Do we have a way to emit VEX instructions?I'm not too familiar with x86 but I'd like to pick this up with some help.
cc: @abrown
abrown commented on issue #4462:
Hm, so here are some thoughts:
- ideally we would have implementations of this for every possible machine, e.g., your libcall implementation in #4460, an AVX implementation for machines that have that, an AVX512 implementation for machines that have that, etc. That is a lot of work, however, so I would propose we only do two of those here: a) the libcall implementation for very old machines and b) either an AVX or AVX512 implementation for newer machines
- To decide "b," it helps to understand that (apologies if you know all this) AVX instructions use a VEX encoding, AVX512 instructions use an EVEX encoding, and that currently in Cranelift we have only really implemented EVEX. Adding a VEX encoding implementation would be nice, but it's a bit of work.
- That particular instruction is available for 128-bit vectors (XMMs) in both the VEX encoding under the FMA flag and in the EVEX encoding under the AVX512VL and AVX512F flags. Those AVX512 flags are usually only available on server class CPUs but the FMA flag is likely available on most recent x86 machines. So, there seems to be a decent justification for implementing VEX encodings: if we add it, we can lower
fma
to a single instruction on more machines than the AVX512. (On the flip side, you could implementvfmadd231ps
with the EVEX encoding but then not be able to run it locally.)- To implement VEX encodings, we would need to fill in
codegen/src/isa/x64/encoding/vex.rs
in much the same fashion as I did forevex.rs
. In this case, however, we could follow the AVX instruction format guide in section 2.3 of Intel's Software Developer's Manual, Volume 2. With that in place, we would probably need think through how to emit AVX instructions; e.g., something likeAvx512Opcode
but perhaps we want to be able to decide on the VEX/EVEX encoding at a later time (?). Finally, we would need ahas_fma
flag incodegen/meta/src/isa/x86.rs
and to plumb that through in a few places.Hopefully that information helps. Ping me on Zulip if you want to have a more "live" discussion.
cfallin closed issue #4462:
:wave: Hey,
It looks like we don't have any lowering for the
fma
instruction when used with SIMD types.
.clif
Test Casefunction %fma_f32x4(f32x4, f32x4, f32x4) -> f32x4 { block0(v0: f32x4, v1: f32x4, v2: f32x4): v3 = fma v0, v1, v2 return v3 }
Steps to Reproduce
cargo run -- compile --target x86_64 ./the-above.clif
Expected Results
A successful compilation.
Actual Results
We don't have this implemented
thread 'main' panicked at 'internal error: entered unreachable code: implemented in ISLE: inst = `v3 = fma.f32x4 v0, v1, v2`, type = `Some(types::F32X4)`', cranelift\codegen\src\isa\x64\lower.rs:808:9 n
Versions and Environment
Cranelift version or commit: main
Operating system: Windows
Architecture:x86_64
Extra Info
In #4460 I tried to implement this using the
vfmadd231ps
instruction, but I had issues encoding the instruction and I'd like some help.It looks like that instruction is only available in VEX encoding (or EVEX for
avx512
but I don't have a machine with that). Is ourEvexInstruction
encoder suitable for emitting VEX instructions, or are they completely different? Do we have a way to emit VEX instructions?I'm not too familiar with x86 but I'd like to pick this up with some help.
cc: @abrown
Last updated: Dec 23 2024 at 12:05 UTC