Stream: git-wasmtime

Topic: wasmtime / issue #3615 Cranelift: Miscompilation of popcn...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 17 2021 at 18:18):

bjorn3 labeled issue #3615:

.clif Test Case

target x86_64 nehalem

function u0:9(i16) -> i32 system_v {
; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros

                                block0(v0: i16):
                                    v1 -> v0
                                    jump block1

                                block1:
@0000                               v2 = bnot.i16 v1
@000b                               v3 = popcnt v2
                                    v4 -> v3
@000b                               jump block2

                                block2:
@000d                               v5 = uextend.i32 v4
@0011                               return v5
}

Expected Results

The popcnt instruction only takes the 16bit part of v1 into account.

Actual Results

The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.

0000000000027246 <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   27246:       55                      push   %rbp
   27247:       48 89 e5                mov    %rsp,%rbp
   2724a:       f7 d7                   not    %edi
   2724c:       f3 0f b8 f7             popcnt %edi,%esi
   27250:       0f b7 f6                movzwl %si,%esi
   27253:       48 89 f0                mov    %rsi,%rax
   27256:       48 89 ec                mov    %rbp,%rsp
   27259:       5d                      pop    %rbp
   2725a:       c3                      retq

Versions and Environment

Cranelift version or commit: Cranelift 0.79.0

Operating system: Linux

Architecture: x86_64

Extra Info

For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960

view this post on Zulip Wasmtime GitHub notifications bot (Dec 17 2021 at 18:18):

bjorn3 opened issue #3615:

.clif Test Case

target x86_64 nehalem

function u0:9(i16) -> i32 system_v {
; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros

                                block0(v0: i16):
                                    v1 -> v0
                                    jump block1

                                block1:
@0000                               v2 = bnot.i16 v1
@000b                               v3 = popcnt v2
                                    v4 -> v3
@000b                               jump block2

                                block2:
@000d                               v5 = uextend.i32 v4
@0011                               return v5
}

Expected Results

The popcnt instruction only takes the 16bit part of v1 into account.

Actual Results

The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.

0000000000027246 <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   27246:       55                      push   %rbp
   27247:       48 89 e5                mov    %rsp,%rbp
   2724a:       f7 d7                   not    %edi
   2724c:       f3 0f b8 f7             popcnt %edi,%esi
   27250:       0f b7 f6                movzwl %si,%esi
   27253:       48 89 f0                mov    %rsi,%rax
   27256:       48 89 ec                mov    %rbp,%rsp
   27259:       5d                      pop    %rbp
   2725a:       c3                      retq

Versions and Environment

Cranelift version or commit: Cranelift 0.79.0

Operating system: Linux

Architecture: x86_64

Extra Info

For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960

view this post on Zulip Wasmtime GitHub notifications bot (Dec 17 2021 at 18:18):

bjorn3 labeled issue #3615:

.clif Test Case

target x86_64 nehalem

function u0:9(i16) -> i32 system_v {
; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros

                                block0(v0: i16):
                                    v1 -> v0
                                    jump block1

                                block1:
@0000                               v2 = bnot.i16 v1
@000b                               v3 = popcnt v2
                                    v4 -> v3
@000b                               jump block2

                                block2:
@000d                               v5 = uextend.i32 v4
@0011                               return v5
}

Expected Results

The popcnt instruction only takes the 16bit part of v1 into account.

Actual Results

The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.

0000000000027246 <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   27246:       55                      push   %rbp
   27247:       48 89 e5                mov    %rsp,%rbp
   2724a:       f7 d7                   not    %edi
   2724c:       f3 0f b8 f7             popcnt %edi,%esi
   27250:       0f b7 f6                movzwl %si,%esi
   27253:       48 89 f0                mov    %rsi,%rax
   27256:       48 89 ec                mov    %rbp,%rsp
   27259:       5d                      pop    %rbp
   2725a:       c3                      retq

Versions and Environment

Cranelift version or commit: Cranelift 0.79.0

Operating system: Linux

Architecture: x86_64

Extra Info

For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960

view this post on Zulip Wasmtime GitHub notifications bot (Dec 17 2021 at 18:35):

cfallin commented on issue #3615:

OK, I was nerdsniped into looking at this briefly; it looks like an issue introduced with #2887 (the initial PR that makes use of popcnt instruction) and masked until recently because the upper bits happened to be zero in the register, as you say.

The basic issue is that this match arm matches on I32, and the impl is intended only for actual 32-bit values; but above that, this code alters ty and computes an ext_spec (extension specification), turning I8/I16 into an I32 with extension first. But this extension only happens with the fallback (non-popcnt-instruction implementation) further down, not the "use popcnt if we can" early-out.

I think we can just move the let (ext_spec, ty) = ... further down, below the use_popcnt case. If you want to put together a PR for that, I'm happy to review!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 17 2021 at 21:15):

cfallin closed issue #3615:

.clif Test Case

target x86_64 nehalem

function u0:9(i16) -> i32 system_v {
; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros

                                block0(v0: i16):
                                    v1 -> v0
                                    jump block1

                                block1:
@0000                               v2 = bnot.i16 v1
@000b                               v3 = popcnt v2
                                    v4 -> v3
@000b                               jump block2

                                block2:
@000d                               v5 = uextend.i32 v4
@0011                               return v5
}

Expected Results

The popcnt instruction only takes the 16bit part of v1 into account.

Actual Results

The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.

0000000000027246 <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   27246:       55                      push   %rbp
   27247:       48 89 e5                mov    %rsp,%rbp
   2724a:       f7 d7                   not    %edi
   2724c:       f3 0f b8 f7             popcnt %edi,%esi
   27250:       0f b7 f6                movzwl %si,%esi
   27253:       48 89 f0                mov    %rsi,%rax
   27256:       48 89 ec                mov    %rbp,%rsp
   27259:       5d                      pop    %rbp
   2725a:       c3                      retq

Versions and Environment

Cranelift version or commit: Cranelift 0.79.0

Operating system: Linux

Architecture: x86_64

Extra Info

For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960


Last updated: Dec 23 2024 at 12:05 UTC