Stream: cranelift

Topic: isle performance with cg_clif


view this post on Zulip bjorn3 (Dec 16 2021 at 17:53):

I just compared the build performance of cg_clif with cranelift 0.78.0 (which is from before isle, right?) against 0.79.0. The performance difference is within noise.

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:11):

That's great, thanks! That that aligns with earlier measurements from fitzgen on Wasmtime

view this post on Zulip bjorn3 (Dec 16 2021 at 18:16):

On less positive news, it looks like there is a regression: https://github.com/bjorn3/rustc_codegen_cranelift/runs/4551314054?check_suite_focus=true#step:11:3420

Cranelift based backend for rustc. Contribute to bjorn3/rustc_codegen_cranelift development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:18):

Hmm, it's possible that a narrow-value case might have changed in some of the instruction migrations? Our test coverage for this could be better (... thanks for filling in the gaps in the meantime :-/ )

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:18):

if you can reduce it to some CLIF, I can take a look

view this post on Zulip bjorn3 (Dec 16 2021 at 18:20):

sure, working on it.

view this post on Zulip bjorn3 (Dec 16 2021 at 18:37):

For signed integers the following tests fail:

const A: $T = 0b0101100;
const B: $T = 0b0100001;
const C: $T = 0b1111001;

const _0: $T = 0;
const _1: $T = !0;

#[test]
fn test_count_zeros() {
    assert_eq!(A.count_zeros(), $T::BITS - 3);
    assert_eq!(B.count_zeros(), $T::BITS - 2);
    assert_eq!(C.count_zeros(), $T::BITS - 5);
}

#[test]
fn test_rotate() {
    assert_eq!(A.rotate_left(6).rotate_right(2).rotate_right(4), A);
    assert_eq!(B.rotate_left(3).rotate_left(2).rotate_right(5), B);
    assert_eq!(C.rotate_left(6).rotate_right(2).rotate_right(4), C);

    // Rotating these should make no difference
    //
    // We test using 124 bits because to ensure that overlong bit shifts do
    // not cause undefined behaviour. See #10183.
    assert_eq!(_1.rotate_left(124), _1);
}

and for unsigned integers the following:

const A: $T = 0b0101100;
const B: $T = 0b0100001;
const C: $T = 0b1111001;

const _0: $T = 0;
const _1: $T = !0;

#[test]
fn test_count_zeros() {
    assert!(A.count_zeros() == $T::BITS - 3);
    assert!(B.count_zeros() == $T::BITS - 2);
    assert!(C.count_zeros() == $T::BITS - 5);
}

#[test]
fn test_rotate() {
    assert_eq!(A.rotate_left(6).rotate_right(2).rotate_right(4), A);
    assert_eq!(B.rotate_left(3).rotate_left(2).rotate_right(5), B);
    assert_eq!(C.rotate_left(6).rotate_right(2).rotate_right(4), C);

    // Rotating these should make no difference
    //
    // We test using 124 bits because to ensure that overlong bit shifts do
    // not cause undefined behaviour. See #10183.
    assert_eq!(_1.rotate_left(124), _1);
    assert_eq!(_1.rotate_right(124), _1);
}

Some assertions only fail for 8bit or 16bit integers.

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:39):

OK, if you can get it down to a CLIF runtest that fails, it's probably (hopefully) a simple fix in masking or something like that...

view this post on Zulip bjorn3 (Dec 16 2021 at 18:40):

.count_zeros() maps to popcnt on the bitwise inverse of the value. The rotate methods to rotl/rotr.

view this post on Zulip bjorn3 (Dec 16 2021 at 18:46):

I can't find the shift amount masking in the new rotl/rotr implementation anymore. It used to be at https://github.com/bytecodealliance/wasmtime/blob/c1a6a0523dbc59d176f708ea3d04e6edb48480ec/cranelift/codegen/src/isa/x64/lower.rs#L2485-L2488

Standalone JIT-style runtime for WebAssembly, using Cranelift - wasmtime/lower.rs at c1a6a0523dbc59d176f708ea3d04e6edb48480ec · bytecodealliance/wasmtime

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:52):

Hmm, indeed, that seems to have been missed

view this post on Zulip Chris Fallin (Dec 16 2021 at 18:56):

Putting together a fix for this now

view this post on Zulip Chris Fallin (Dec 16 2021 at 19:39):

@bjorn3 let me know if this fixes the issue for you: https://github.com/bytecodealliance/wasmtime/pull/3610

Uncovered by @bjorn3 (thanks!): 8- and 16-bit rotates were not working properly in recent versions of Cranelift with part of the lowering migrated to ISLE. This PR fixes a few issues: 8- and 16-b...

view this post on Zulip bjorn3 (Dec 16 2021 at 19:44):

I will check it out tomorrow.

view this post on Zulip bjorn3 (Dec 17 2021 at 09:48):

<i16>::count_zeros is codegened as

00000000000282dd <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   282dd:       55                      push   %rbp
   282de:       48 89 e5                mov    %rsp,%rbp
   282e1:       f7 d7                   not    %edi
   282e3:       f3 0f b8 f7             popcnt %edi,%esi
   282e7:       0f b7 f6                movzwl %si,%esi
   282ea:       48 89 f0                mov    %rsi,%rax
   282ed:       48 89 ec                mov    %rbp,%rsp
   282f0:       5d                      pop    %rbp
   282f1:       c3                      retq

The popcnt %edi, %esi is a 32bit popcount, while it should be 16bit. <i8>::count_zeros is codegened as:

0000000000029ef5 <_RNvMNtCs1K2K8gQcegq_4core3numa11count_zeros>:
   29ef5:       55                      push   %rbp
   29ef6:       48 89 e5                mov    %rsp,%rbp
   29ef9:       f7 d7                   not    %edi
   29efb:       f3 0f b8 f7             popcnt %edi,%esi
   29eff:       40 0f b6 f6             movzbl %sil,%esi
   29f03:       48 89 f0                mov    %rsi,%rax
   29f06:       48 89 ec                mov    %rbp,%rsp
   29f09:       5d                      pop    %rbp
   29f0a:       c3                      retq

view this post on Zulip bjorn3 (Dec 17 2021 at 09:52):

It seems like the popcnt codegen code didn't change beteen cranelift 0.78.0 and 0.79.0 though.

view this post on Zulip bjorn3 (Dec 17 2021 at 09:56):

Looks like before cranelift 0.79.0 the not only applies to 8 or 16bit and the rest of the register happened to be zero by chance:

00000000000282dd <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   282dd:       55                      push   %rbp
   282de:       48 89 e5                mov    %rsp,%rbp
   282e1:       66 f7 d7                not    %di
   282e4:       f3 0f b8 f7             popcnt %edi,%esi
   282e8:       0f b7 f6                movzwl %si,%esi
   282eb:       48 89 f0                mov    %rsi,%rax
   282ee:       48 89 ec                mov    %rbp,%rsp
   282f1:       5d                      pop    %rbp
   282f2:       c3                      retq

view this post on Zulip Chris Fallin (Dec 17 2021 at 17:59):

Hmm, OK. I have limited time today then am off for two weeks, unfortunately, so I don't think I can pick this back up today; can you file an issue for this?

view this post on Zulip bjorn3 (Dec 17 2021 at 18:18):

Done: https://github.com/bytecodealliance/wasmtime/issues/3615

.clif Test Case target x86_64 nehalem function u0:9(i16) -> i32 system_v { ; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros block0(v0: i16): v1 -> v0 jump block1 block1: @0000 v2 = bno...

Last updated: Nov 22 2024 at 17:03 UTC