afonso360 opened PR #4474 from x86-vex
to main
:
:wave: Hey,
Following the discussion in #4462, this PR adds a VEX encoder and lowering for SIMD
fma
with FMA extension instructions.The VEX Encoder has an API similar to the EVEX encoder, however instead of writing the instruction as the caller is calling the fields, we store them all and later write the full instruction in
encode
.This is because the VEX instruction format is much less predictable than EVEX and I couldn't find a way to cleanly write the bit fields, since at any time we can switch from 2 to 3 byte prefixes.
@abrown mentioned in #4462, there are potentially some improvements that we may want to do.
With that in place, we would probably need think through how to emit AVX instructions; e.g., something like Avx512Opcode but perhaps we want to be able to decide on the VEX/EVEX encoding at a later time (?)
Right now this implements
AvxOpcode
and always lowers it in VEX encoding. However I like the idea of choosing the encoding based on what is available and most efficient. Suggestions here would be appreciated!Fixes #4462
afonso360 edited PR #4474 from x86-vex
to main
:
:wave: Hey,
Following the discussion in #4462, this PR adds a VEX encoder and lowering for SIMD
fma
with FMA extension instructions.The VEX Encoder has an API similar to the EVEX encoder, however instead of writing the instruction as the caller is calling the fields, we store them all and later write the full instruction in
encode
.This is because the VEX instruction format is much less predictable than EVEX and I couldn't find a way to cleanly write the bit fields, since at any time we can switch from 2 to 3 byte prefixes.
@abrown mentioned in #4462, there are potentially some improvements that we may want to do.
With that in place, we would probably need think through how to emit AVX instructions; e.g., something like Avx512Opcode but perhaps we want to be able to decide on the VEX/EVEX encoding at a later time (?)
Right now this implements
AvxOpcode
and always lowers it in VEX encoding. However I like the idea of choosing the encoding based on what is available and most efficient. Suggestions on how we would implement this would be appreciated!Fixes #4462
afonso360 updated PR #4474 from x86-vex
to main
.
afonso360 updated PR #4474 from x86-vex
to main
.
afonso360 updated PR #4474 from x86-vex
to main
.
abrown created PR review comment:
How come we need these moves?
abrown submitted PR review.
abrown submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I suspect this may be due to using a "mod" operand rather than a separate src and dest with dest tied to src via a reuse-constraint. @afonso360 would you mind matching how the other instructions work in that way? Then you can assert that src1 == dst during emission and the emission code remains unchanged, but it makes the usage here simpler.
afonso360 created PR review comment:
They aren't actually emitted, but
x
isn't writable, so we need a reg that is, and sincevfmadd213pd
modifies the first reg we also need the value ofx
to be in that register. I'm not sure if this is the best way to achieve that.
afonso360 submitted PR review.
afonso360 edited PR review comment.
afonso360 updated PR #4474 from x86-vex
to main
.
afonso360 updated PR #4474 from x86-vex
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
The order of these uses of
allocs
needs to match the order in which registers are given to theOperandCollector
inget_operands
below. (I know, it's a terrible footgun... we hope to make this a bit more ergonomic later!)
cfallin submitted PR review.
afonso360 updated PR #4474 from x86-vex
to main
.
afonso360 updated PR #4474 from x86-vex
to main
.
afonso360 created PR review comment:
Thanks! I'm still not too comfortable with this.
afonso360 submitted PR review.
cfallin submitted PR review.
cfallin has enabled auto merge for PR #4474.
cfallin merged PR #4474.
Last updated: Dec 23 2024 at 12:05 UTC