Stream: git-wasmtime

Topic: wasmtime / issue #6422 How can I use the compiled functio...


view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:29):

YjyJeff opened issue #6422:

Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

Now, I want to call it in rust side. Therefore, I have set the enable_simd in settings and cast the compiled function pointer to fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4). However, when I invoke this function in Rust it produces an incorrect result:

let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)>(compiled_ptr);
let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]);
println!("{:?}", result);

// Output: [2, 3, 32, 0]. Totally wrong here....

To test whether the compiled function is correct, I have written a .clif file with the following content and tested it with clif-util test, it passes the test.

test interpret
test run
target x86_64 has_avx has_avx2

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]

Above all, I think the problem happens in transmute, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?

Thanks in advance

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:38):

cfallin commented on issue #6422:

First, a small thing -- the signature

fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)

is I think wrong in a subtle way; the return type should not be fn(i32x4) but just i32x4.

I took a guess that this is perhaps a calling convenion mismatch, and this seems to be the case indeed. Using Compiler Explorer I built the function fn add(a: i32x4, b: i32x4) -> i32x4 { a + b } with both Rust's native ABI and the extern "C" ABI:

#![feature(portable_simd)]

pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}


pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}

And I get

example::add:
        mov     rax, rdi
        movdqa  xmm0, xmmword ptr [rdx]
        paddd   xmm0, xmmword ptr [rsi]
        movdqa  xmmword ptr [rdi], xmm0
        ret

example::add2:
        paddd   xmm0, xmm1
        ret

so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.

Cranelift follows SysV for vector types as well, so if you call the function from Rust as an extern "C" fn(i32x4, i32x4) -> i32x4, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:39):

cfallin edited a comment on issue #6422:

First, a small thing -- the signature

fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)

is I think wrong in a subtle way; the return type should not be fn(i32x4) but just i32x4.

Overall though: I took a guess that this is perhaps a calling convenion mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function fn add(a: i32x4, b: i32x4) -> i32x4 { a + b } with both Rust's native ABI and the extern "C" ABI:

#![feature(portable_simd)]

pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}


pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}

And I get

example::add:
        mov     rax, rdi
        movdqa  xmm0, xmmword ptr [rdx]
        paddd   xmm0, xmmword ptr [rsi]
        movdqa  xmmword ptr [rdi], xmm0
        ret

example::add2:
        paddd   xmm0, xmm1
        ret

so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.

Cranelift follows SysV for vector types as well, so if you call the function from Rust as an extern "C" fn(i32x4, i32x4) -> i32x4, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:39):

cfallin edited a comment on issue #6422:

First, a small thing -- the signature

fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)

is I think wrong in a subtle way; the return type should not be fn(i32x4) but just i32x4.

Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function fn add(a: i32x4, b: i32x4) -> i32x4 { a + b } with both Rust's native ABI and the extern "C" ABI:

#![feature(portable_simd)]

pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}


pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}

And I get

example::add:
        mov     rax, rdi
        movdqa  xmm0, xmmword ptr [rdx]
        paddd   xmm0, xmmword ptr [rsi]
        movdqa  xmmword ptr [rdi], xmm0
        ret

example::add2:
        paddd   xmm0, xmm1
        ret

so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.

Cranelift follows SysV for vector types as well, so if you call the function from Rust as an extern "C" fn(i32x4, i32x4) -> i32x4, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:40):

cfallin edited a comment on issue #6422:

First, a small thing -- the signature

fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)

is I think wrong in a subtle way; the return type should not be fn(i32x4) but just i32x4.

Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function fn add(a: i32x4, b: i32x4) -> i32x4 { a + b } with both Rust's native ABI and the extern "C" ABI:

#![feature(portable_simd)]

pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}


pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
    a + b
}

And I get

example::add:
        mov     rax, rdi
        movdqa  xmm0, xmmword ptr [rdx]
        paddd   xmm0, xmmword ptr [rsi]
        movdqa  xmmword ptr [rdi], xmm0
        ret

example::add2:
        paddd   xmm0, xmm1
        ret

so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.

Cranelift follows SysV for vector types as well, so if you call the function from Rust as an extern "C" fn(i32x4, i32x4) -> i32x4, it should work. I haven't put together a full repro of your case with Cranelift, but let us know if this doesn't work. Best of luck!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:42):

YjyJeff edited issue #6422:

Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

Now, I want to call it in rust side. Therefore, I have set the enable_simd in settings and cast the compiled function pointer to fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4). However, when I invoke this function in Rust it produces an incorrect result:

let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> std::simd::i32x4>(compiled_ptr);
let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]);
println!("{:?}", result);

// Output: [2, 3, 32, 0]. Totally wrong here....

To test whether the compiled function is correct, I have written a .clif file with the following content and tested it with clif-util test, it passes the test.

test interpret
test run
target x86_64 has_avx has_avx2

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]

Above all, I think the problem happens in transmute, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?

Thanks in advance

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:45):

YjyJeff commented on issue #6422:

First, a small thing -- the signature

fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)

is I think wrong in a subtle way; the return type should not be fn(i32x4) but just i32x4.

Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function fn add(a: i32x4, b: i32x4) -> i32x4 { a + b } with both Rust's native ABI and the extern "C" ABI:

```rust
#![feature(portable_simd)]

pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
a + b
}

pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
a + b
}
```

And I get

```assembly
example::add:
mov rax, rdi
movdqa xmm0, xmmword ptr [rdx]
paddd xmm0, xmmword ptr [rsi]
movdqa xmmword ptr [rdi], xmm0
ret

example::add2:
paddd xmm0, xmm1
ret
```

so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.

Cranelift follows SysV for vector types as well, so if you call the function from Rust as an extern "C" fn(i32x4, i32x4) -> i32x4, it should work. I haven't put together a full repro of your case with Cranelift, but let us know if this doesn't work. Best of luck!

After changing the signature to extern "C" fn(i32x4, i32x4) -> i32x4, it works now. It is caused by the mismatch of the calling conversion. Thanks a lot!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2023 at 02:45):

YjyJeff closed issue #6422:

Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

Now, I want to call it in rust side. Therefore, I have set the enable_simd in settings and cast the compiled function pointer to fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4). However, when I invoke this function in Rust it produces an incorrect result:

let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> std::simd::i32x4>(compiled_ptr);
let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]);
println!("{:?}", result);

// Output: [2, 3, 32, 0]. Totally wrong here....

To test whether the compiled function is correct, I have written a .clif file with the following content and tested it with clif-util test, it passes the test.

test interpret
test run
target x86_64 has_avx has_avx2

function %simd_cmp(i32x4, i32x4) -> i32x4 system_v {
    block0(v0: i32x4, v1: i32x4):
    v2 = icmp eq v0, v1
    return v2
}

; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]

Above all, I think the problem happens in transmute, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?

Thanks in advance


Last updated: Nov 22 2024 at 17:03 UTC