github-actions[bot] commented on Issue #1612:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
whitequark commented on Issue #1612:
if you have a need for working x86_32 sooner than that work might be done
I believe I do have something close to working x86_32 codegen, at least, it doesn't crash in the compiler and it seems to emit reasonably looking code. It doesn't actually work if you call it, of course.
whitequark commented on Issue #1612:
I do notice that you test
ishl
in all six, did you mean to have halfishl
and halfushr
?Oops.
I think this is entirely reasonable.
Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass
--enable-simd
for the compiler to accept it.
whitequark edited a comment on Issue #1612:
I do notice that you test
ishl
in all six, did you mean to have halfishl
and halfushr
?Oops.
I think this is entirely reasonable.
Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass
--enable-simd
for the compiler to accept it. Apparently PEXTR is a part of SSE4.1.
whitequark edited a comment on Issue #1612:
I do notice that you test
ishl
in all six, did you mean to have halfishl
and halfushr
?Oops.
I think this is entirely reasonable.
Actually, I think this PR, while it might be an entirely reasonable transformation in isolation, isn't sufficient on its own, because it causes all wide shifts to be legalized into SIMD operations and you have to pass
--enable-simd
for the compiler to accept it. Apparently PEXTR and PINSR are a part of SSE4.1.
whitequark commented on Issue #1612:
The split
load.i32
is wrong. The legalizer split theload.i64
into twoload.i32
with eight bytes between them!Oh, I actually noticed that and never figured out what the hell was going on! I eventually disregarded it under the assumption that I'm misreading something. Wasn't expecting it to be a bug.
whitequark commented on Issue #1612:
@iximeow And we have liftoff!
![Screenshot_20200428_120942](https://user-images.githubusercontent.com/54771/80485537-24b9dd80-8949-11ea-94bc-6603a6302674.png)This is without stack overflow checks since I find it utterly incomprehensible how one would emit a
mov %scratch, 0(%esp)
with Cranelift, but other than that it's basically ready for upstream.
iximeow commented on Issue #1612:
And we have liftoff!
:tada: :tada:because it causes all wide shifts to be legalized into SIMD operations
It shouldn't... for x86_64 there are legal encodings for(I64, _)
so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.Apparently PEXTR and PINSR are a part of SSE4.1.
well.. most of them.PEXTRW
andPINSRW
exist back in SSE2. That still requires--enable-simd
, so a legalization using only integer registers would still be an improvement.how one would emit a mov %scratch, 0(%esp) with Cranelift
Something to the tune ofstack_store
with an appropriate stack slot should generate this, but I think "appropriate stack slot" is the harder part there. I'm actually not sure if Cranelift emits stack canaries anywhere, now that you mention it..
iximeow edited a comment on Issue #1612:
And we have liftoff!
:tada: :tada:
because it causes all wide shifts to be legalized into SIMD operations
It shouldn't... for x86_64 there are legal encodings for
(I64, _)
so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.Apparently PEXTR and PINSR are a part of SSE4.1.
well.. most of them.
PEXTRW
andPINSRW
exist back in SSE2. That still requires--enable-simd
, so a legalization using only integer registers would still be an improvement.how one would emit a mov %scratch, 0(%esp) with Cranelift
Something to the tune of
stack_store
with an appropriate stack slot should generate this, but I think "appropriate stack slot" is the harder part there. I'm actually not sure if Cranelift emits stack canaries anywhere, now that you mention it..
whitequark commented on Issue #1612:
It shouldn't... for x86_64 there are legal encodings for
(I64, _)
so they shouldn't require legalization. A test to confirm that wouldn't hurt, though.I mean only on x86_32. There are lots of wasm files that shift 64-bit values and as is, on x86_32 Cranelift cannot emit them in any way other than by using SIMD.
iximeow commented on Issue #1612:
After noodling over how legalizations fit together today I noticed the failing test cases can be made to pass; your two failing cases with 64-bit shift amounts can be legalized by a different approach:
diff --git a/cranelift/codegen/meta/src/isa/x86/legalize.rs b/cranelift/codegen/meta/src/isa/x86/legalize.rs index 6bcc2b94f..e2ff2150e 100644 --- a/cranelift/codegen/meta/src/isa/x86/legalize.rs +++ b/cranelift/codegen/meta/src/isa/x86/legalize.rs @@ -55,6 +55,26 @@ pub(crate) fn define(shared: &mut SharedDefinitions, x86_instructions: &Instruct let imm = &shared.imm; + // Left-shift by a 64-bit amount is equivalent to a shift by that amount mod 32, so we can + // reduce the size of the shift amount. This is useful for x86_32, where an I64 shift amount is + // not encodable. + let a = var("a"); + let x = var("x"); + let y = var("y"); + let z = var("z"); + + let ishl = insts.by_name("ishl"); + let ireduce = insts.by_name("ireduce"); + let reduction = ireduce.bind(I32).bind(I64); + let inst = ishl.bind(I32).bind(I64); + group.legalize( + def!(a = inst(x, y)), + vec![ + def!(z = reduction(y)), + def!(a = ishl(x, z)), + ] + ); + // Division and remainder. // // The srem expansion requires custom code because srem INT_MIN, -1 is not
The same applies for
ushr
.With a
rustfmt
and test cases (with the second half beingushr
rather thanishl
) I think this is good to go, i686 is better than no x86_32 at all.
whitequark commented on Issue #1612:
I noticed the failing test cases can be made to pass; your two failing cases with 64-bit shift amounts can be legalized by a different approach:
How are you testing this? I can't get them to pass with your patch applied. It seems to me that they won't ever pass no matter how the legalizer is changed while the encoding tables (incorrectly) list that any shift amount is acceptable.
iximeow commented on Issue #1612:
How are you testing this?
incorrectly, is how. I had misplaced the tests in a file ending with
.clf
instead of.clif
, so the tests weren't actually run. You're right, x86 encodings needed a tweak for to deny i64 shift amounts fror x86_32. This diff _actually_ results in passing tests, with your tests properly placed in a .clif under filetests:diff --git a/cranelift/codegen/meta/src/isa/x86/encodings.rs b/cranelift/codegen/meta/src/isa/x86/encodings.rs index 7aee35bdb..1e395967c 100644 --- a/cranelift/codegen/meta/src/isa/x86/encodings.rs +++ b/cranelift/codegen/meta/src/isa/x86/encodings.rs @@ -1488,7 +1488,15 @@ fn define_alu( // Cannot use enc_i32_i64 for this pattern because instructions require // to bind any. e.enc32( - inst.bind(I32).bind(Any), + inst.bind(I32).bind(I8), + rec_rc.opcodes(&ROTATE_CL).rrr(rrr), + ); + e.enc32( + inst.bind(I32).bind(I16), + rec_rc.opcodes(&ROTATE_CL).rrr(rrr), + ); + e.enc32( + inst.bind(I32).bind(I32), rec_rc.opcodes(&ROTATE_CL).rrr(rrr), ); e.enc64( diff --git a/cranelift/codegen/meta/src/isa/x86/legalize.rs b/cranelift/codegen/meta/src/isa/x86/legalize.rs index 6bcc2b94f..e2ff2150e 100644 --- a/cranelift/codegen/meta/src/isa/x86/legalize.rs +++ b/cranelift/codegen/meta/src/isa/x86/legalize.rs @@ -55,6 +55,26 @@ pub(crate) fn define(shared: &mut SharedDefinitions, x86_instructions: &Instruct let imm = &shared.imm; + // Left-shift by a 64-bit amount is equivalent to a shift by that amount mod 32, so we can + // reduce the size of the shift amount. This is useful for x86_32, where an I64 shift amount is + // not encodable. + let a = var("a"); + let x = var("x"); + let y = var("y"); + let z = var("z"); + + let ishl = insts.by_name("ishl"); + let ireduce = insts.by_name("ireduce"); + let reduction = ireduce.bind(I32).bind(I64); + let inst = ishl.bind(I32).bind(I64); + group.legalize( + def!(a = inst(x, y)), + vec![ + def!(z = reduction(y)), + def!(a = ishl(x, z)), + ] + ); + // Division and remainder. // // The srem expansion requires custom code because srem INT_MIN, -1 is not
whitequark commented on Issue #1612:
This diff _actually_ results in passing tests
Ah wonderful, I see how you changed the encodings; I tried a few things but none seemed to work. Let me see if I can integrate everything nicely.
whitequark commented on Issue #1612:
Okay, I think I've addressed all review.
@iximeow Thanks for your help, it was vital for getting this done.
Last updated: Jan 24 2025 at 00:11 UTC