Stream: git-wasmtime

Topic: wasmtime / PR #4474 x64: Implement SIMD `fma`


view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2022 at 11:38):

afonso360 opened PR #4474 from x86-vex to main:

:wave: Hey,

Following the discussion in #4462, this PR adds a VEX encoder and lowering for SIMD fma with FMA extension instructions.

The VEX Encoder has an API similar to the EVEX encoder, however instead of writing the instruction as the caller is calling the fields, we store them all and later write the full instruction in encode.

This is because the VEX instruction format is much less predictable than EVEX and I couldn't find a way to cleanly write the bit fields, since at any time we can switch from 2 to 3 byte prefixes.

@abrown mentioned in #4462, there are potentially some improvements that we may want to do.

With that in place, we would probably need think through how to emit AVX instructions; e.g., something like Avx512Opcode but perhaps we want to be able to decide on the VEX/EVEX encoding at a later time (?)

Right now this implements AvxOpcode and always lowers it in VEX encoding. However I like the idea of choosing the encoding based on what is available and most efficient. Suggestions here would be appreciated!

Fixes #4462

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2022 at 11:38):

afonso360 edited PR #4474 from x86-vex to main:

:wave: Hey,

Following the discussion in #4462, this PR adds a VEX encoder and lowering for SIMD fma with FMA extension instructions.

The VEX Encoder has an API similar to the EVEX encoder, however instead of writing the instruction as the caller is calling the fields, we store them all and later write the full instruction in encode.

This is because the VEX instruction format is much less predictable than EVEX and I couldn't find a way to cleanly write the bit fields, since at any time we can switch from 2 to 3 byte prefixes.

@abrown mentioned in #4462, there are potentially some improvements that we may want to do.

With that in place, we would probably need think through how to emit AVX instructions; e.g., something like Avx512Opcode but perhaps we want to be able to decide on the VEX/EVEX encoding at a later time (?)

Right now this implements AvxOpcode and always lowers it in VEX encoding. However I like the idea of choosing the encoding based on what is available and most efficient. Suggestions on how we would implement this would be appreciated!

Fixes #4462

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2022 at 11:48):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2022 at 12:28):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2022 at 12:29):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:25):

abrown created PR review comment:

How come we need these moves?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:25):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:25):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:29):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:29):

cfallin created PR review comment:

Ah, I suspect this may be due to using a "mod" operand rather than a separate src and dest with dest tied to src via a reuse-constraint. @afonso360 would you mind matching how the other instructions work in that way? Then you can assert that src1 == dst during emission and the emission code remains unchanged, but it makes the usage here simpler.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:30):

afonso360 created PR review comment:

They aren't actually emitted, but x isn't writable, so we need a reg that is, and since vfmadd213pd modifies the first reg we also need the value of x to be in that register. I'm not sure if this is the best way to achieve that.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:30):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 18:32):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 19:16):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 19:25):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:42):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:42):

cfallin created PR review comment:

The order of these uses of allocs needs to match the order in which registers are given to the OperandCollector in get_operands below. (I know, it's a terrible footgun... we hope to make this a bit more ergonomic later!)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:42):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:55):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:56):

afonso360 updated PR #4474 from x86-vex to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:57):

afonso360 created PR review comment:

Thanks! I'm still not too comfortable with this.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:57):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 21:27):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 21:30):

cfallin has enabled auto merge for PR #4474.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 22:01):

cfallin merged PR #4474.


Last updated: Jan 24 2025 at 00:11 UTC