YjyJeff opened issue #6422:
Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)
function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 }
Now, I want to call it in rust side. Therefore, I have set the
enable_simd
insettings
and cast the compiled function pointer tofn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
. However, when I invoke this function in Rust it produces an incorrect result:let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)>(compiled_ptr); let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]); println!("{:?}", result); // Output: [2, 3, 32, 0]. Totally wrong here....
To test whether the compiled function is correct, I have written a
.clif
file with the following content and tested it withclif-util test
, it passes the test.test interpret test run target x86_64 has_avx has_avx2 function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 } ; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]
Above all, I think the problem happens in
transmute
, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?Thanks in advance
cfallin commented on issue #6422:
First, a small thing -- the signature
fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
is I think wrong in a subtle way; the return type should not be
fn(i32x4)
but justi32x4
.I took a guess that this is perhaps a calling convenion mismatch, and this seems to be the case indeed. Using Compiler Explorer I built the function
fn add(a: i32x4, b: i32x4) -> i32x4 { a + b }
with both Rust's native ABI and theextern "C"
ABI:#![feature(portable_simd)] pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b } pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b }
And I get
example::add: mov rax, rdi movdqa xmm0, xmmword ptr [rdx] paddd xmm0, xmmword ptr [rsi] movdqa xmmword ptr [rdi], xmm0 ret example::add2: paddd xmm0, xmm1 ret
so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.
Cranelift follows SysV for vector types as well, so if you call the function from Rust as an
extern "C" fn(i32x4, i32x4) -> i32x4
, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!
cfallin edited a comment on issue #6422:
First, a small thing -- the signature
fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
is I think wrong in a subtle way; the return type should not be
fn(i32x4)
but justi32x4
.Overall though: I took a guess that this is perhaps a calling convenion mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function
fn add(a: i32x4, b: i32x4) -> i32x4 { a + b }
with both Rust's native ABI and theextern "C"
ABI:#![feature(portable_simd)] pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b } pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b }
And I get
example::add: mov rax, rdi movdqa xmm0, xmmword ptr [rdx] paddd xmm0, xmmword ptr [rsi] movdqa xmmword ptr [rdi], xmm0 ret example::add2: paddd xmm0, xmm1 ret
so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.
Cranelift follows SysV for vector types as well, so if you call the function from Rust as an
extern "C" fn(i32x4, i32x4) -> i32x4
, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!
cfallin edited a comment on issue #6422:
First, a small thing -- the signature
fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
is I think wrong in a subtle way; the return type should not be
fn(i32x4)
but justi32x4
.Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function
fn add(a: i32x4, b: i32x4) -> i32x4 { a + b }
with both Rust's native ABI and theextern "C"
ABI:#![feature(portable_simd)] pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b } pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b }
And I get
example::add: mov rax, rdi movdqa xmm0, xmmword ptr [rdx] paddd xmm0, xmmword ptr [rsi] movdqa xmmword ptr [rdi], xmm0 ret example::add2: paddd xmm0, xmm1 ret
so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.
Cranelift follows SysV for vector types as well, so if you call the function from Rust as an
extern "C" fn(i32x4, i32x4) -> i32x4
, it should work. I haven't put together a full repro of your case with Wasmtime, but let us know if this doesn't work. Best of luck!
cfallin edited a comment on issue #6422:
First, a small thing -- the signature
fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
is I think wrong in a subtle way; the return type should not be
fn(i32x4)
but justi32x4
.Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function
fn add(a: i32x4, b: i32x4) -> i32x4 { a + b }
with both Rust's native ABI and theextern "C"
ABI:#![feature(portable_simd)] pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b } pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 { a + b }
And I get
example::add: mov rax, rdi movdqa xmm0, xmmword ptr [rdx] paddd xmm0, xmmword ptr [rsi] movdqa xmmword ptr [rdi], xmm0 ret example::add2: paddd xmm0, xmm1 ret
so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.
Cranelift follows SysV for vector types as well, so if you call the function from Rust as an
extern "C" fn(i32x4, i32x4) -> i32x4
, it should work. I haven't put together a full repro of your case with Cranelift, but let us know if this doesn't work. Best of luck!
YjyJeff edited issue #6422:
Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)
function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 }
Now, I want to call it in rust side. Therefore, I have set the
enable_simd
insettings
and cast the compiled function pointer tofn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
. However, when I invoke this function in Rust it produces an incorrect result:let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> std::simd::i32x4>(compiled_ptr); let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]); println!("{:?}", result); // Output: [2, 3, 32, 0]. Totally wrong here....
To test whether the compiled function is correct, I have written a
.clif
file with the following content and tested it withclif-util test
, it passes the test.test interpret test run target x86_64 has_avx has_avx2 function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 } ; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]
Above all, I think the problem happens in
transmute
, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?Thanks in advance
YjyJeff commented on issue #6422:
First, a small thing -- the signature
fn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
is I think wrong in a subtle way; the return type should not be
fn(i32x4)
but justi32x4
.Overall though: I took a guess that this is perhaps a calling convention mismatch, and this seems to be the case indeed. Using Compiler Explorer (link) I built the function
fn add(a: i32x4, b: i32x4) -> i32x4 { a + b }
with both Rust's native ABI and theextern "C"
ABI:```rust
#![feature(portable_simd)]pub fn add(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
a + b
}pub extern "C" fn add2(a: std::simd::i32x4, b: std::simd::i32x4) -> std::simd::i32x4 {
a + b
}
```And I get
```assembly
example::add:
mov rax, rdi
movdqa xmm0, xmmword ptr [rdx]
paddd xmm0, xmmword ptr [rsi]
movdqa xmmword ptr [rdi], xmm0
retexample::add2:
paddd xmm0, xmm1
ret
```so it appears that the native Rust convention is to pass SIMD values by pointer, whereas the C ABI follows the SysV spec and passes 128-bit vectors in XMM registers.
Cranelift follows SysV for vector types as well, so if you call the function from Rust as an
extern "C" fn(i32x4, i32x4) -> i32x4
, it should work. I haven't put together a full repro of your case with Cranelift, but let us know if this doesn't work. Best of luck!After changing the signature to
extern "C" fn(i32x4, i32x4) -> i32x4
, it works now. It is caused by the mismatch of the calling conversion. Thanks a lot!
YjyJeff closed issue #6422:
Hi, I want to use SIMD in cranelift to accelerate computation. I compiled the following function(logged by cranelift's tracing info)
function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 }
Now, I want to call it in rust side. Therefore, I have set the
enable_simd
insettings
and cast the compiled function pointer tofn(std::simd::i32x4, std::simd::i32x4) -> fn(std::simd::i32x4)
. However, when I invoke this function in Rust it produces an incorrect result:let func = transmute::<fn(std::simd::i32x4, std::simd::i32x4) -> std::simd::i32x4>(compiled_ptr); let result = func(std::simd::i32x4::from_array([0, 1, 0, 3]), std::simd::i32x4::from_array([5, 0, 0, 1]); println!("{:?}", result); // Output: [2, 3, 32, 0]. Totally wrong here....
To test whether the compiled function is correct, I have written a
.clif
file with the following content and tested it withclif-util test
, it passes the test.test interpret test run target x86_64 has_avx has_avx2 function %simd_cmp(i32x4, i32x4) -> i32x4 system_v { block0(v0: i32x4, v1: i32x4): v2 = icmp eq v0, v1 return v2 } ; run: %simd_cmp([0 1 0 3], [5 0 0 1]) == [0 0 -1 0]
Above all, I think the problem happens in
transmute
, calling the compiled function with this signature is incorrect. How can I call this compiled function correctly?Thanks in advance
Last updated: Nov 22 2024 at 17:03 UTC