I just compared the build performance of cg_clif with cranelift 0.78.0 (which is from before isle, right?) against 0.79.0. The performance difference is within noise.
That's great, thanks! That that aligns with earlier measurements from fitzgen on Wasmtime
On less positive news, it looks like there is a regression: https://github.com/bjorn3/rustc_codegen_cranelift/runs/4551314054?check_suite_focus=true#step:11:3420
Hmm, it's possible that a narrow-value case might have changed in some of the instruction migrations? Our test coverage for this could be better (... thanks for filling in the gaps in the meantime :-/ )
if you can reduce it to some CLIF, I can take a look
sure, working on it.
For signed integers the following tests fail:
const A: $T = 0b0101100;
const B: $T = 0b0100001;
const C: $T = 0b1111001;
const _0: $T = 0;
const _1: $T = !0;
#[test]
fn test_count_zeros() {
assert_eq!(A.count_zeros(), $T::BITS - 3);
assert_eq!(B.count_zeros(), $T::BITS - 2);
assert_eq!(C.count_zeros(), $T::BITS - 5);
}
#[test]
fn test_rotate() {
assert_eq!(A.rotate_left(6).rotate_right(2).rotate_right(4), A);
assert_eq!(B.rotate_left(3).rotate_left(2).rotate_right(5), B);
assert_eq!(C.rotate_left(6).rotate_right(2).rotate_right(4), C);
// Rotating these should make no difference
//
// We test using 124 bits because to ensure that overlong bit shifts do
// not cause undefined behaviour. See #10183.
assert_eq!(_1.rotate_left(124), _1);
}
and for unsigned integers the following:
const A: $T = 0b0101100;
const B: $T = 0b0100001;
const C: $T = 0b1111001;
const _0: $T = 0;
const _1: $T = !0;
#[test]
fn test_count_zeros() {
assert!(A.count_zeros() == $T::BITS - 3);
assert!(B.count_zeros() == $T::BITS - 2);
assert!(C.count_zeros() == $T::BITS - 5);
}
#[test]
fn test_rotate() {
assert_eq!(A.rotate_left(6).rotate_right(2).rotate_right(4), A);
assert_eq!(B.rotate_left(3).rotate_left(2).rotate_right(5), B);
assert_eq!(C.rotate_left(6).rotate_right(2).rotate_right(4), C);
// Rotating these should make no difference
//
// We test using 124 bits because to ensure that overlong bit shifts do
// not cause undefined behaviour. See #10183.
assert_eq!(_1.rotate_left(124), _1);
assert_eq!(_1.rotate_right(124), _1);
}
Some assertions only fail for 8bit or 16bit integers.
OK, if you can get it down to a CLIF runtest that fails, it's probably (hopefully) a simple fix in masking or something like that...
.count_zeros()
maps to popcnt
on the bitwise inverse of the value. The rotate methods to rotl
/rotr
.
I can't find the shift amount masking in the new rotl
/rotr
implementation anymore. It used to be at https://github.com/bytecodealliance/wasmtime/blob/c1a6a0523dbc59d176f708ea3d04e6edb48480ec/cranelift/codegen/src/isa/x64/lower.rs#L2485-L2488
Hmm, indeed, that seems to have been missed
Putting together a fix for this now
@bjorn3 let me know if this fixes the issue for you: https://github.com/bytecodealliance/wasmtime/pull/3610
I will check it out tomorrow.
<i16>::count_zeros
is codegened as
00000000000282dd <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
282dd: 55 push %rbp
282de: 48 89 e5 mov %rsp,%rbp
282e1: f7 d7 not %edi
282e3: f3 0f b8 f7 popcnt %edi,%esi
282e7: 0f b7 f6 movzwl %si,%esi
282ea: 48 89 f0 mov %rsi,%rax
282ed: 48 89 ec mov %rbp,%rsp
282f0: 5d pop %rbp
282f1: c3 retq
The popcnt %edi, %esi
is a 32bit popcount, while it should be 16bit. <i8>::count_zeros
is codegened as:
0000000000029ef5 <_RNvMNtCs1K2K8gQcegq_4core3numa11count_zeros>:
29ef5: 55 push %rbp
29ef6: 48 89 e5 mov %rsp,%rbp
29ef9: f7 d7 not %edi
29efb: f3 0f b8 f7 popcnt %edi,%esi
29eff: 40 0f b6 f6 movzbl %sil,%esi
29f03: 48 89 f0 mov %rsi,%rax
29f06: 48 89 ec mov %rbp,%rsp
29f09: 5d pop %rbp
29f0a: c3 retq
It seems like the popcnt codegen code didn't change beteen cranelift 0.78.0 and 0.79.0 though.
Looks like before cranelift 0.79.0 the not
only applies to 8 or 16bit and the rest of the register happened to be zero by chance:
00000000000282dd <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
282dd: 55 push %rbp
282de: 48 89 e5 mov %rsp,%rbp
282e1: 66 f7 d7 not %di
282e4: f3 0f b8 f7 popcnt %edi,%esi
282e8: 0f b7 f6 movzwl %si,%esi
282eb: 48 89 f0 mov %rsi,%rax
282ee: 48 89 ec mov %rbp,%rsp
282f1: 5d pop %rbp
282f2: c3 retq
Hmm, OK. I have limited time today then am off for two weeks, unfortunately, so I don't think I can pick this back up today; can you file an issue for this?
Done: https://github.com/bytecodealliance/wasmtime/issues/3615
Last updated: Nov 22 2024 at 17:03 UTC