Stream: git-wasmtime

Topic: wasmtime / Issue #1612 Legalize 64 bit shifts on x86_32 u...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 02:07):

github-actions[bot] commented on Issue #1612:

Subscribe to Label Action

cc @bnjbvr

<details>
This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 09:52):

whitequark commented on Issue #1612:

if you have a need for working x86_32 sooner than that work might be done

I believe I do have something close to working x86_32 codegen, at least, it doesn't crash in the compiler and it seems to emit reasonably looking code. It doesn't actually work if you call it, of course.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 09:53):

whitequark commented on Issue #1612:

I do notice that you test ishl in all six, did you mean to have half ishl and half ushr?

Oops.

I think this is entirely reasonable.

Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass --enable-simd for the compiler to accept it.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 09:53):

whitequark edited a comment on Issue #1612:

I do notice that you test ishl in all six, did you mean to have half ishl and half ushr?

Oops.

I think this is entirely reasonable.

Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass --enable-simd for the compiler to accept it. Apparently PEXTR is a part of SSE4.1.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 09:54):

whitequark edited a comment on Issue #1612:

I do notice that you test ishl in all six, did you mean to have half ishl and half ushr?

Oops.

I think this is entirely reasonable.

Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass --enable-simd for the compiler to accept it. Apparently PEXTR and PINSR are a part of SSE4.1.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 10:07):

whitequark commented on Issue #1612:

The split load.i32 is wrong. The legalizer split the load.i64 into two load.i32 with eight bytes between them!

Oh, I actually noticed that and never figured out what the hell was going on! I eventually disregarded it under the assumption that I'm misreading something. Wasn't expecting it to be a bug.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 12:12):

whitequark commented on Issue #1612:

@iximeow And we have liftoff!
![Screenshot_20200428_120942](https://user-images.githubusercontent.com/54771/80485537-24b9dd80-8949-11ea-94bc-6603a6302674.png)

This is without stack overflow checks since I find it utterly incomprehensible how one would emit a mov %scratch, 0(%esp) with Cranelift, but other than that it's basically ready for upstream.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 20:49):

iximeow commented on Issue #1612:

And we have liftoff!
:tada: :tada:

because it causes all wide shifts to be legalized into SIMD operations
It shouldn't... for x86_64 there are legal encodings for (I64, _) so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.

Apparently PEXTR and PINSR are a part of SSE4.1.
well.. most of them. PEXTRW and PINSRW exist back in SSE2. That still requires --enable-simd, so a legalization using only integer registers would still be an improvement.

how one would emit a mov %scratch, 0(%esp) with Cranelift
Something to the tune of stack_store with an appropriate stack slot should generate this, but I think "appropriate stack slot" is the harder part there. I'm actually not sure if Cranelift emits stack canaries anywhere, now that you mention it..

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2020 at 20:49):

iximeow edited a comment on Issue #1612:

And we have liftoff!

:tada: :tada:

because it causes all wide shifts to be legalized into SIMD operations

It shouldn't... for x86_64 there are legal encodings for (I64, _) so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.

Apparently PEXTR and PINSR are a part of SSE4.1.

well.. most of them. PEXTRW and PINSRW exist back in SSE2. That still requires --enable-simd, so a legalization using only integer registers would still be an improvement.

how one would emit a mov %scratch, 0(%esp) with Cranelift

Something to the tune of stack_store with an appropriate stack slot should generate this, but I think "appropriate stack slot" is the harder part there. I'm actually not sure if Cranelift emits stack canaries anywhere, now that you mention it..

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2020 at 03:59):

whitequark commented on Issue #1612:

It shouldn't... for x86_64 there are legal encodings for (I64, _) so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.

I mean only on x86_32. There are lots of wasm files that shift 64-bit values and as is, on x86_32 Cranelift cannot emit them in any way other than by using SIMD.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 02:29):

iximeow commented on Issue #1612:

After noodling over how legalizations fit together today I noticed the failing test cases can be made to pass; your two failing cases with 64-bit shift amounts can be legalized by a different approach:

diff --git a/cranelift/codegen/meta/src/isa/x86/legalize.rs b/cranelift/codegen/meta/src/isa/x86/legalize.rs
index 6bcc2b94f..e2ff2150e 100644
--- a/cranelift/codegen/meta/src/isa/x86/legalize.rs
+++ b/cranelift/codegen/meta/src/isa/x86/legalize.rs
@@ -55,6 +55,26 @@ pub(crate) fn define(shared: &mut SharedDefinitions, x86_instructions: &Instruct

     let imm = &shared.imm;


+    // Left-shift by a 64-bit amount is equivalent to a shift by that amount mod 32, so we can
+    // reduce the size of the shift amount. This is useful for x86_32, where an I64 shift amount is
+    // not encodable.
+    let a = var("a");
+    let x = var("x");
+    let y = var("y");
+    let z = var("z");
+

+    let ishl = insts.by_name("ishl");
+    let ireduce = insts.by_name("ireduce");
+    let reduction = ireduce.bind(I32).bind(I64);
+    let inst = ishl.bind(I32).bind(I64);
+    group.legalize(
+        def!(a = inst(x, y)),
+        vec![
+            def!(z = reduction(y)),
+            def!(a = ishl(x, z)),
+        ]
+    );
+
     // Division and remainder.
     //
     // The srem expansion requires custom code because srem INT_MIN, -1 is not

The same applies for ushr.

With a rustfmt and test cases (with the second half being ushr rather than ishl) I think this is good to go, i686 is better than no x86_32 at all.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2020 at 06:57):

whitequark commented on Issue #1612:

I noticed the failing test cases can be made to pass; your two failing cases with 64-bit shift amounts can be legalized by a different approach:

How are you testing this? I can't get them to pass with your patch applied. It seems to me that they won't ever pass no matter how the legalizer is changed while the encoding tables (incorrectly) list that any shift amount is acceptable.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2020 at 08:32):

iximeow commented on Issue #1612:

How are you testing this?

incorrectly, is how. I had misplaced the tests in a file ending with .clf instead of .clif, so the tests weren't actually run. You're right, x86 encodings needed a tweak for to deny i64 shift amounts fror x86_32. This diff _actually_ results in passing tests, with your tests properly placed in a .clif under filetests:

diff --git a/cranelift/codegen/meta/src/isa/x86/encodings.rs b/cranelift/codegen/meta/src/isa/x86/encodings.rs
index 7aee35bdb..1e395967c 100644
--- a/cranelift/codegen/meta/src/isa/x86/encodings.rs
+++ b/cranelift/codegen/meta/src/isa/x86/encodings.rs
@@ -1488,7 +1488,15 @@ fn define_alu(
         // Cannot use enc_i32_i64 for this pattern because instructions require
         // to bind any.
         e.enc32(

-            inst.bind(I32).bind(Any),
+            inst.bind(I32).bind(I8),
+            rec_rc.opcodes(&ROTATE_CL).rrr(rrr),
+        );
+        e.enc32(
+            inst.bind(I32).bind(I16),
+            rec_rc.opcodes(&ROTATE_CL).rrr(rrr),
+        );
+        e.enc32(
+            inst.bind(I32).bind(I32),
             rec_rc.opcodes(&ROTATE_CL).rrr(rrr),
         );
         e.enc64(
diff --git a/cranelift/codegen/meta/src/isa/x86/legalize.rs b/cranelift/codegen/meta/src/isa/x86/legalize.rs
index 6bcc2b94f..e2ff2150e 100644
--- a/cranelift/codegen/meta/src/isa/x86/legalize.rs
+++ b/cranelift/codegen/meta/src/isa/x86/legalize.rs
@@ -55,6 +55,26 @@ pub(crate) fn define(shared: &mut SharedDefinitions, x86_instructions: &Instruct

     let imm = &shared.imm;


+    // Left-shift by a 64-bit amount is equivalent to a shift by that amount mod 32, so we can
+    // reduce the size of the shift amount. This is useful for x86_32, where an I64 shift amount is
+    // not encodable.
+    let a = var("a");
+    let x = var("x");
+    let y = var("y");
+    let z = var("z");
+

+    let ishl = insts.by_name("ishl");
+    let ireduce = insts.by_name("ireduce");
+    let reduction = ireduce.bind(I32).bind(I64);
+    let inst = ishl.bind(I32).bind(I64);
+    group.legalize(
+        def!(a = inst(x, y)),
+        vec![
+            def!(z = reduction(y)),
+            def!(a = ishl(x, z)),
+        ]
+    );
+
     // Division and remainder.
     //
     // The srem expansion requires custom code because srem INT_MIN, -1 is not

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2020 at 08:59):

whitequark commented on Issue #1612:

This diff _actually_ results in passing tests

Ah wonderful, I see how you changed the encodings; I tried a few things but none seemed to work. Let me see if I can integrate everything nicely.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2020 at 09:38):

whitequark commented on Issue #1612:

Okay, I think I've addressed all review.

@iximeow Thanks for your help, it was vital for getting this done.


Last updated: Oct 23 2024 at 20:03 UTC