cranelift / PR #1370 Add initial support for EVEX encodings · git-cranelift

Stream: git-cranelift

Topic: cranelift / PR #1370 Add initial support for EVEX encodings

GitHub (Jan 29 2020 at 20:02):

abrown opened PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Jan 29 2020 at 20:02):

abrown requested bnjbvr, sunfishcode, and julian-seward1 for a review on PR #1370.

GitHub (Jan 29 2020 at 20:02):

abrown requested bnjbvr, sunfishcode, and julian-seward1 for a review on PR #1370.

GitHub (Jan 29 2020 at 20:02):

abrown requested bnjbvr, sunfishcode, and julian-seward1 for a review on PR #1370.

GitHub (Jan 29 2020 at 20:05):

abrown created PR Review Comment:

I don't think I understand how RegUnit is being packed, I'm afraid. EVEX (and VEX) support larger numbers of registers so I attempted to extract the next bit to use in the bits that EVEX uses to extend the set of possible registers (e.g. XRBR'V')--this seems to give the wrong answer. How would I tell Cranelift that EVEX has more registers available?

GitHub (Jan 29 2020 at 20:05):

abrown submitted PR Review.

GitHub (Jan 29 2020 at 20:07):

abrown submitted PR Review.

GitHub (Jan 29 2020 at 20:07):

abrown created PR Review Comment:

The [wikipedia article] is also helpful if you don't want to crack open the Intel manual; maybe I will link to it in the documentation.

[wikipedia article]: https://en.wikipedia.org/wiki/EVEX_prefix

GitHub (Feb 05 2020 at 17:44):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 13 2020 at 01:05):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 13 2020 at 01:11):

abrown submitted PR Review.

GitHub (Feb 13 2020 at 01:11):

abrown created PR Review Comment:

@bnjbvr, this (and the .units(32) above) is causing test failures that I don't understand:

isa::x86::registers::tests::unit_encodings is failing because now the GPRs miraculously have more units so %xmm0 starts at Some(32)

similarly the unit_names test thinks the reg index 16 is invalid

What is going on?

GitHub (Feb 13 2020 at 02:14):

iximeow submitted PR Review.

GitHub (Feb 13 2020 at 02:14):

iximeow created PR Review Comment:

I think you're running afoul of https://github.com/bytecodealliance/cranelift/blob/5602d1564c4121e4703843f23cc083dfb29c5592/cranelift-codegen/meta/src/cdsl/regs.rs#L240-L247 which bumps the start of FPR to be aligned with its size. In theory you could add the expanded FPR first, then GPR, and everything should be okay?

GitHub (Feb 13 2020 at 02:31):

abrown submitted PR Review.

GitHub (Feb 13 2020 at 02:31):

abrown created PR Review Comment:

Aha!

GitHub (Feb 13 2020 at 18:13):

abrown submitted PR Review.

GitHub (Feb 13 2020 at 18:13):

abrown created PR Review Comment:

Hm, too quick to celebrate: I tried moving the RegBankBuilder::new("FloatRegs"... with the newly-expanded 32 units up above the RegBankBuilder::new("IntRegs"... but now the meta build crashes due to r15 with a backtrace of:

thread 'main' panicked at 'attempt to subtract with overflow', cranelift-codegen/meta/src/cdsl/regs.rs:60:25
stack backtrace:
...
  15: cranelift_codegen_meta::cdsl::regs::RegBank::unit_by_name
             at cranelift-codegen/meta/src/cdsl/regs.rs:60
  16: cranelift_codegen_meta::cdsl::regs::IsaRegs::regunit_by_name
             at cranelift-codegen/meta/src/cdsl/regs.rs:410
  17: cranelift_codegen_meta::isa::x86::recipes::define
             at cranelift-codegen/meta/src/isa/x86/recipes.rs:430
  18: cranelift_codegen_meta::isa::x86::define
             at cranelift-codegen/meta/src/isa/x86/mod.rs:61
  19: cranelift_codegen_meta::isa::define::{{closure}}
             at cranelift-codegen/meta/src/isa/mod.rs:62
...

GitHub (Feb 13 2020 at 18:40):

iximeow submitted PR Review.

GitHub (Feb 13 2020 at 18:40):

iximeow created PR Review Comment:

I think the failing assert is in error, f.ex let reg_xmm1 = Register::new(fpr, regs.regunit_by_name(fpr, "xmm1")); currently asserts for the same reason on master. It looks like unit_by_name shouldn't even trying to subtract from the register number (15 in r15), and happens to work out so long as the specified reg class starts at 0 - subtract by 0 won't underflow.

I think instead unit_by_name ought to be checking just that as_num < self.units, and return self.first_unit + as_num like it would below?

GitHub (Feb 14 2020 at 00:12):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 14 2020 at 00:18):

abrown submitted PR Review.

GitHub (Feb 14 2020 at 00:18):

abrown created PR Review Comment:

You're right. I pushed b0ee482 to fix that and b52fbdd to show what happens when I move the XMM registers first:

failures in unit_names and unit_encodings because, e.g., rax is no longer the first register; these can be fixed and are not a real problem

failures with unwinding (test_*_alloc and windows_fastcall_x64_unwind.clif): @peterhuene, it looks like the unwind info uses the register index directly; if I change a bunch of register indexes, does this completely break unwinding?

a failure in simd-arithmetic-binemit.clif is expected; that's what I'm currently working on

a failure in pinned-reg.clif like panicked in worker #1: Couldn't clear fixed output interference for v1; @bnjbvr, do you have any comment on this? Will changing the register indexes break regalloc?

GitHub (Feb 14 2020 at 01:20):

iximeow submitted PR Review.

GitHub (Feb 14 2020 at 01:20):

iximeow created PR Review Comment:

I was afraid there'd be some breakages like that :(

I noticed Windows unwinding using the regunit as an integer directly when fixing XMM preservation. Changing indices around will break it, but straightforward to fix: something like this but using GPR.first?

To the regalloc point, pinned_reg also accidentally depends on GPRs being first: is_pinned_reg tests against pinned_reg directly, where the ISA builder stores the provided pinned_reg without translation and when building the x86 ISA description it's passed the literal 15. pinned_reg in the generated ISA should probably be offset from the start of its RegClass, so reg_bank.pinned_reg ought to be reg_bank.pinned_reg + reg_bank.first_unit + reg_class.start.

I think fixing dependence on GPR.first == 0 would be a good PR independent of EVEX encoding, though I dunno how much of this will be replaced with New Backend Plans. (this is where stop talking and wait for Benjamin :D)

GitHub (Feb 14 2020 at 01:55):

peterhuene created PR Review Comment:

@iximeow I think your proposed fix for the GPRs in the Windows unwind information makes sense.

GitHub (Feb 14 2020 at 01:55):

peterhuene submitted PR Review.

GitHub (Feb 14 2020 at 19:46):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 14 2020 at 21:24):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 14 2020 at 21:24):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 14 2020 at 21:25):

abrown submitted PR Review.

GitHub (Feb 14 2020 at 21:25):

abrown created PR Review Comment:

I've figured this out... finally :big_smile:.

GitHub (Feb 14 2020 at 21:40):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Feb 18 2020 at 17:41):

abrown updated PR #1370 from evex to master:

[x] This has been discussed in issue #1244.

[x] A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in VEX/EVEX, it seems like we should support these encoding formats. For this PR, only EVEX was added (and not every addressing form) in order to implement i64x2 multiplication. I haven't yet test-run this on an AVX512DQ-compatible CPU but will soon; while implementing this, I discovered that there is no good way to indicate an encoding that is predicated on either of two settings flags (see TODO in encodings.rs).

[x] This PR contains test cases, if meaningful.

[x] A reviewer from the core maintainer team has been assigned for this PR (I'm tagging people that might be interested in this).

GitHub (Mar 05 2020 at 15:13):

bnjbvr closed without merge PR #1370.

Last updated: Apr 14 2025 at 10:03 UTC