Stream: git-wasmtime

Topic: wasmtime / issue #5645 Cranelift: missing zero extension ...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:10):

T0b1-iOS opened issue #5645:

I have stumbled upon a rather nasty 'bug' this week.
When compiling IR like this:

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

you will get assembly that looks like this:

  pushq   %rbp
  unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  movq    %rsp, %rbp
  unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
  cmpq    $0, %rdi
  setnz   %sil
  load_ext_name userextname0+0, %r8
  call    *%r8
  movq    %rbp, %rsp
  popq    %rbp
  ret

Note that the icmp_imm gets compiled to a cmp followed by a setcc instruction using sil as a parameter.
This means that the upper bytes of rsi are undefined. Now when fn2 is a C function that looks like this:

void fun(uint64_t x, bool a) {
    do_smth(a ? x : 0);
}

and you compile this with clang the assembly looks like this:

xor     eax, eax
test    esi, esi
cmove   rdi, rax
jmp     do_smth(unsigned long)@PLT

while with gcc you get

xor     eax, eax
test    sil, sil
cmove   rdi, rax
jmp     do_smth(unsigned long)

Note that gcc emits a test sil, sil instruction while clang emits test esi, esi meaning that it relies on the 8bit argument getting zero-extended to at least 32 bit and therefore you will get random behavior depending on whether the upper bytes of rsi were zero or not. (Compiler Explorer link)
According to the people at LLVM, whether arguments should be zero-extended or not is unclear in the ABI and therefore they do not consider it a bug until this issue has been resolved on the ABI level since the major compilers do zero-extend the arguments.

Now my question is, do you consider this a bug in cranelift or at least an issue that should be addressed on the codegen level inside cranelift? (I have fixed this by manually checking if an argument is I8/I16 when emitting a function call and then passed that through a uextend/inarrow combo)
Maybe this could be done when the enable_llvm_abi_extensions flag is set?
And then the question would be where to fix it: When passing a variable to an external function or making sure that when instructions like setcc are emitted the upper parts of the destination register are cleared beforehand?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:10):

T0b1-iOS labeled issue #5645:

I have stumbled upon a rather nasty 'bug' this week.
When compiling IR like this:

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

you will get assembly that looks like this:

  pushq   %rbp
  unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  movq    %rsp, %rbp
  unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
  cmpq    $0, %rdi
  setnz   %sil
  load_ext_name userextname0+0, %r8
  call    *%r8
  movq    %rbp, %rsp
  popq    %rbp
  ret

Note that the icmp_imm gets compiled to a cmp followed by a setcc instruction using sil as a parameter.
This means that the upper bytes of rsi are undefined. Now when fn2 is a C function that looks like this:

void fun(uint64_t x, bool a) {
    do_smth(a ? x : 0);
}

and you compile this with clang the assembly looks like this:

xor     eax, eax
test    esi, esi
cmove   rdi, rax
jmp     do_smth(unsigned long)@PLT

while with gcc you get

xor     eax, eax
test    sil, sil
cmove   rdi, rax
jmp     do_smth(unsigned long)

Note that gcc emits a test sil, sil instruction while clang emits test esi, esi meaning that it relies on the 8bit argument getting zero-extended to at least 32 bit and therefore you will get random behavior depending on whether the upper bytes of rsi were zero or not. (Compiler Explorer link)
According to the people at LLVM, whether arguments should be zero-extended or not is unclear in the ABI and therefore they do not consider it a bug until this issue has been resolved on the ABI level since the major compilers do zero-extend the arguments.

Now my question is, do you consider this a bug in cranelift or at least an issue that should be addressed on the codegen level inside cranelift? (I have fixed this by manually checking if an argument is I8/I16 when emitting a function call and then passed that through a uextend/inarrow combo)
Maybe this could be done when the enable_llvm_abi_extensions flag is set?
And then the question would be where to fix it: When passing a variable to an external function or making sure that when instructions like setcc are emitted the upper parts of the destination register are cleared beforehand?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:10):

T0b1-iOS labeled issue #5645:

I have stumbled upon a rather nasty 'bug' this week.
When compiling IR like this:

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

you will get assembly that looks like this:

  pushq   %rbp
  unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  movq    %rsp, %rbp
  unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
  cmpq    $0, %rdi
  setnz   %sil
  load_ext_name userextname0+0, %r8
  call    *%r8
  movq    %rbp, %rsp
  popq    %rbp
  ret

Note that the icmp_imm gets compiled to a cmp followed by a setcc instruction using sil as a parameter.
This means that the upper bytes of rsi are undefined. Now when fn2 is a C function that looks like this:

void fun(uint64_t x, bool a) {
    do_smth(a ? x : 0);
}

and you compile this with clang the assembly looks like this:

xor     eax, eax
test    esi, esi
cmove   rdi, rax
jmp     do_smth(unsigned long)@PLT

while with gcc you get

xor     eax, eax
test    sil, sil
cmove   rdi, rax
jmp     do_smth(unsigned long)

Note that gcc emits a test sil, sil instruction while clang emits test esi, esi meaning that it relies on the 8bit argument getting zero-extended to at least 32 bit and therefore you will get random behavior depending on whether the upper bytes of rsi were zero or not. (Compiler Explorer link)
According to the people at LLVM, whether arguments should be zero-extended or not is unclear in the ABI and therefore they do not consider it a bug until this issue has been resolved on the ABI level since the major compilers do zero-extend the arguments.

Now my question is, do you consider this a bug in cranelift or at least an issue that should be addressed on the codegen level inside cranelift? (I have fixed this by manually checking if an argument is I8/I16 when emitting a function call and then passed that through a uextend/inarrow combo)
Maybe this could be done when the enable_llvm_abi_extensions flag is set?
And then the question would be where to fix it: When passing a variable to an external function or making sure that when instructions like setcc are emitted the upper parts of the destination register are cleared beforehand?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:18):

bjorn3 commented on issue #5645:

You will need to set the zext flag in the function signature (and sext in case of signed ints). Cranelift provides all the building blocks for correctly handling the abi, but it is the responsibility of the frontend to correctly use those building blocks, just like in LLVM. See for example https://github.com/rust-lang/rust/blob/7919ef0ec5776c72dace7fec1c68551a617505ad/compiler/rustc_target/src/abi/call/x86_64.rs#L232 for where rustc specifies that these flags need to be set.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:35):

cfallin commented on issue #5645:

It seems the crux of the matter is how we interpret SysV ABI handling of narrow values; if according to the LLVM bug you linked, zext up to 32 bits is "de-facto the ABI already", then we could implement that, perhaps under a flag. But until/unless we decide to do that, @bjorn3's suggestion is the best: when translating C signatures to Cranelift, add zext to i8 or i16 values.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:35):

T0b1-iOS commented on issue #5645:

I'm not sure where to set it though since setting it on the parameter (that's what the ArgumentExtension doc suggests) like below compiles to the same assembly.

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8 uext) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:36):

T0b1-iOS edited a comment on issue #5645:

I'm not sure where to set it though since setting it on the parameter (that's what the ArgumentExtension doc suggests) like below compiles to the same assembly. (maybe it's not parsed by cranelift-reader there? Need to check.

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8 uext) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:37):

T0b1-iOS edited a comment on issue #5645:

I'm not sure where to set it though since setting it on the parameter (that's what the ArgumentExtension doc suggests) like below compiles to the same assembly. (maybe it's not parsed by cranelift-reader there? Need to check.)

function u0:0(i64) -> i64 system_v {
    sig2 = (i64, i8 uext) -> i64 system_v
    fn2 = u0:112 sig2

block0(v0: i64):
    v1 = icmp_imm ne v0, 0
    v2 = call fn2(v0, v1)
    return v2
}

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:43):

cfallin commented on issue #5645:

Ah, that is indeed a bug if uext doesn't actually work! We'll look into this.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 19:57):

T0b1-iOS commented on issue #5645:

I mean, the comment for ArgumentExtensions mentions

This attribute specifies how an argument or return value should be extended if the platform and ABI require it

and since the SystemV document is 'ambigous' on this issue it's technically not required so no extension needs to be done :P

The culprit seems to be get_ext_mode in X64ABIMachineSpec:

fn get_ext_mode(
    _call_conv: isa::CallConv,
    _specified: ir::ArgumentExtension,
) -> ir::ArgumentExtension {
    ir::ArgumentExtension::None
}

It always ignores the specified extension. So make it honor the specified extension if the calling conv is SysV? Or only when the LLVM ABI Extensions are used...

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2023 at 21:21):

bjorn3 commented on issue #5645:

if according to the LLVM bug you linked, zext up to 32 bits is "de-facto the ABI already", then we could implement that, perhaps under a flag.

As I understand it the de-facto ABI is to zero or sign extend depending on if the type is unsigned or signed. As clif ir doesn't distinguish between the two, explicit zext and sext are the only way to know if it should be zero or sign extended.


Last updated: Jan 24 2025 at 00:11 UTC