abrown opened PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.It is still a draft as it depends on some commits in #1759.
abrown updated PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.It is still a draft as it depends on some commits in #1759.
abrown updated PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.It is still a draft as it depends on some commits in #1759.
abrown has marked PR #1765 as ready for review.
abrown edited PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
It is still a draft as it depends on some commits in #1759.
abrown requested bnjbvr for a review on PR #1765.
bnjbvr requested bnjbvr and julian-seward1 for a review on PR #1765.
abrown updated PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
It is still a draft as it depends on some commits in #1759.
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
What rounding behavior does
VCVTUDQ2PS
have? Is it: +inf -inf to-zero to-nearest or is it mxcsr dependent? Please mention this in the comment.
julian-seward1 created PR Review Comment:
Could you write out the sequence of SSE41 insns in a comment? It's hard to infer them from the sequence of
let
s below.
julian-seward1 created PR Review Comment:
As per previous review comment, please document the rounding behaviour here.
abrown updated PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
It is still a draft as it depends on some commits in #1759.
abrown submitted PR Review.
abrown created PR Review Comment:
MXCSR by default, overriden by static rounding control in EVEX.L'L; per the manual, sec. 2.6.8, the static rounding control only applies to scalar and 512-bit instructions, so our use of the 128-bit vector length will rely on MXCSR.
abrown submitted PR Review.
abrown created PR Review Comment:
Documented in this description.
abrown updated PR #1765 from i32x4-to-f32x4
to master
:
This converts an
i32x4
into anf32x4
with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
It is still a draft as it depends on some commits in #1759.
abrown merged PR #1765.
Last updated: Nov 22 2024 at 17:03 UTC