uweigand commented on issue #5147:
s390x: Bswap not implemented
We do have bswap instructions, and they're already implemented in the backend: for
$I32
and$I64
,bswap_reg
can be used; for$I16
, a 32-bit bswap followed by a 16-bit right shift is generally the most efficient solution.See e.g. the use as part of the
bitrev
implementation:(rule (bitrev_bytes $I16 x) (lshr_imm $I32 (bswap_reg $I32 x) 16)) (rule (bitrev_bytes $I32 x) (bswap_reg $I32 x)) (rule (bitrev_bytes $I64 x) (bswap_reg $I64 x))
afonso360 commented on issue #5147:
This is awesome! Thanks! :tada:
I added this to our fuzzer (Feel free to add that to this PR if you want!) and It's been running for over an hour on aarch64 without any issues, however on x86 it reported an interesting failure.
I've tried to minimize this as much as I could but it sometimes throws weird results.
test interpret test run target x86_64 function %a() -> i8, i32, i64 { block0: v5 = iconst.i64 0x9903_5204_d05f_abab v6 = bswap v5 v7 = iconst.i8 0 v8 = iconst.i32 0 return v7, v8, v6 } ; run: %a() == [0, 0, -6076657925176032359]
You can run this testcase from the cranelift directory with
cargo run -- test ./the-above.clif
Removing the v7 or v8 makes the test pass, which is interesting... but otherwise the test segfaults on my machine, most of the time but not always.
Here is the disassembly, which you can get with
cargo run -- compile -D --target x86_64 ./the-above.clif
:Disassembly of 29 bytes: 0: 55 push rbp 1: 48 89 e5 mov rbp, rsp 4: 49 bb ab ab 5f d0 04 52 03 99 movabs r11, 0x99035204d05fabab e: 4c 0f cb bswap rbx 11: 31 c0 xor eax, eax 13: 31 d2 xor edx, edx 15: 4c 89 1f mov qword ptr [rdi], r11 18: 48 89 ec mov rsp, rbp 1b: 5d pop rbp 1c: c3 ret
<details>
<summary>Here's a more reliable example that always fails with a wrong result, but never segfaults.</summary>
test interpret test run target x86_64 function %a(f32, f64, i32, i32, f64) -> i8, i32, i64 { block0(v0: f32, v1: f64, v2: i32, v3: i32, v4: f64): v5 = iconst.i64 0x9903_5204_d05f_abab v6 = bswap v5 v7 = iconst.i8 0 v8 = iconst.i32 0 return v7, v8, v6 } ; run: %a(0.0, 0.0, 0, 0, 0.0) == [0, 0, -6076657925176032359]
Here's the disassembly of the above function
Disassembly of 32 bytes: 0: 55 push rbp 1: 48 89 e5 mov rbp, rsp 4: 49 89 d1 mov r9, rdx 7: 49 b8 ab ab 5f d0 04 52 03 99 movabs r8, 0x99035204d05fabab 11: 4c 0f c8 bswap rax 14: 31 c0 xor eax, eax 16: 31 d2 xor edx, edx 18: 4d 89 01 mov qword ptr [r9], r8 1b: 48 89 ec mov rsp, rbp 1e: 5d pop rbp 1f: c3 ret
</details>
11evan commented on issue #5147:
Thanks for the feedback everyone! I'll work on addressing everything, including adding runtests and investigating the fuzzer failure
11evan commented on issue #5147:
The x86_64 issue the fuzzer found was an incorrect REX encoding - I misunderstood the manual and was using REX.R instead of REX.B when encoding bswap access to r8-r15. Fixed locally and added those cases to the runtests, they pass now.
I'll have an update out sometime in the next couple days once I address the other feedback and add s380x support
11evan edited a comment on issue #5147:
The x86_64 issue the fuzzer found was an incorrect REX encoding - I misunderstood the manual and was using REX.R instead of REX.B when encoding bswap access to r8-r15. Fixed locally and added those cases to the runtests, they pass now.
I'll have an update out sometime in the next couple days once I address the other feedback and add s390x support
11evan commented on issue #5147:
PR updated:
- fixed REX encoding bug found by fuzzer
- added runtests
- added s390x support
- other cleanups from feedback (remove extraneous comments, exclude 8-bit integers, use x64
RexFlags
type, add bswap to fuzzgen)
uweigand commented on issue #5147:
s390x parts LGTM, thanks for adding this. (Seeing the discussion above, if consensus is to add I128 support, you can likewise copy the 128-bit byteswap implementation via vector permute from bitrev.)
11evan commented on issue #5147:
PR Updated - legalized 128-bit bswap, added runtest for interpreter. Didn't actually implement 128-bit swap in any of the backends yet.
11evan commented on issue #5147:
Looks like the CI run failed at
verify-publish
with:Run cd /opt/hostedtoolcache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 [...] 100 212 100 212 0 0 40 0 0:00:05 0:00:05 --:--:-- 54 gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Error: Process completed with exit code 2.
It looks like an issue with the tooling rather than an issue with my commit - what should I do here, does it just need a retry?
jameysharp commented on issue #5147:
Apparently it did just need a retry. Merged now! :tada:
Last updated: Jan 24 2025 at 00:11 UTC