bjorn3 labeled Issue #1423:
I compiled
test compile set enable_simd target x86_64-unknown-linux-gnu haswell function u0:0() -> i32 system_v { ss0 = explicit_slot 32 sig0 = () system_v fn0 = colocated u0:2 sig0 block0: v0 = stack_addr.i64 ss0 v1 = load.i32x4 v0 call fn0() v2 = extractlane.i32x4 v1, 1 return v2 }this results in the following ir after regalloc:
function u0:0(i64 fp [%rbp]) -> i32 [%rax], i64 fp [%rbp] system_v { ss0 = explicit_slot 32, offset -48 ss1 = spill_slot 16, offset -64 ss2 = incoming_arg 16, offset -16 sig0 = () system_v fn0 = colocated u0:2 sig0 block0(v5: i64 [%rbp]): [RexOp1pushq#50] x86_push v5 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 48 [RexOp1spaddr8_id#808d,%rax] v0 = stack_addr.i64 ss0 [DynRexOp2fld#410,%xmm0] v3 = load.i32x4 v0 [RexOp2fspillSib32#411,ss1] v1 = spill v3 [Op1call_id#e8] call fn0() [RexOp2ffillSib32#410,%xmm15] v4 = fill v1 [DynRexMp3r_ib_unsigned_gpr#d16,%rax] v2 = x86_pextr v4, 1 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 48 [RexOp1popq#58,%rbp] v6 = x86_pop.i64 [Op1ret#c3] return v2, v6 }The value passed to
x86_pextris stored in%xmm15, however the resulting asm expects it in%xmm7despite actually being filled to the correct%xmm15:0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 48 83 ec 30 sub rsp, 0x30 9: 48 8d 84 24 10 00 00 00 lea rax, [rsp + 0x10] 11: 0f 10 00 movups xmm0, xmmword ptr [rax] 14: 40 0f 11 84 24 00 00 00 00 movups xmmword ptr [rsp], xmm0 1d: e8 00 00 00 00 call 0x22 22: 44 0f 10 bc 24 00 00 00 00 movups xmm15, xmmword ptr [rsp] 2b: 66 41 0f 3a 16 f8 01 pextrd r8d, xmm7, 1 32: 48 83 c4 30 add rsp, 0x30 36: 40 5d pop rbp 38: c3 ret(Experimenting with SIMD support for cg_clif)
bjorn3 opened Issue #1423:
I compiled
test compile set enable_simd target x86_64-unknown-linux-gnu haswell function u0:0() -> i32 system_v { ss0 = explicit_slot 32 sig0 = () system_v fn0 = colocated u0:2 sig0 block0: v0 = stack_addr.i64 ss0 v1 = load.i32x4 v0 call fn0() v2 = extractlane.i32x4 v1, 1 return v2 }this results in the following ir after regalloc:
function u0:0(i64 fp [%rbp]) -> i32 [%rax], i64 fp [%rbp] system_v { ss0 = explicit_slot 32, offset -48 ss1 = spill_slot 16, offset -64 ss2 = incoming_arg 16, offset -16 sig0 = () system_v fn0 = colocated u0:2 sig0 block0(v5: i64 [%rbp]): [RexOp1pushq#50] x86_push v5 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 48 [RexOp1spaddr8_id#808d,%rax] v0 = stack_addr.i64 ss0 [DynRexOp2fld#410,%xmm0] v3 = load.i32x4 v0 [RexOp2fspillSib32#411,ss1] v1 = spill v3 [Op1call_id#e8] call fn0() [RexOp2ffillSib32#410,%xmm15] v4 = fill v1 [DynRexMp3r_ib_unsigned_gpr#d16,%rax] v2 = x86_pextr v4, 1 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 48 [RexOp1popq#58,%rbp] v6 = x86_pop.i64 [Op1ret#c3] return v2, v6 }The value passed to
x86_pextris stored in%xmm15, however the resulting asm expects it in%xmm7despite actually being filled to the correct%xmm15:0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 48 83 ec 30 sub rsp, 0x30 9: 48 8d 84 24 10 00 00 00 lea rax, [rsp + 0x10] 11: 0f 10 00 movups xmm0, xmmword ptr [rax] 14: 40 0f 11 84 24 00 00 00 00 movups xmmword ptr [rsp], xmm0 1d: e8 00 00 00 00 call 0x22 22: 44 0f 10 bc 24 00 00 00 00 movups xmm15, xmmword ptr [rsp] 2b: 66 41 0f 3a 16 f8 01 pextrd r8d, xmm7, 1 32: 48 83 c4 30 add rsp, 0x30 36: 40 5d pop rbp 38: c3 ret(Experimenting with SIMD support for cg_clif)
bjorn3 labeled Issue #1423:
I compiled
test compile set enable_simd target x86_64-unknown-linux-gnu haswell function u0:0() -> i32 system_v { ss0 = explicit_slot 32 sig0 = () system_v fn0 = colocated u0:2 sig0 block0: v0 = stack_addr.i64 ss0 v1 = load.i32x4 v0 call fn0() v2 = extractlane.i32x4 v1, 1 return v2 }this results in the following ir after regalloc:
function u0:0(i64 fp [%rbp]) -> i32 [%rax], i64 fp [%rbp] system_v { ss0 = explicit_slot 32, offset -48 ss1 = spill_slot 16, offset -64 ss2 = incoming_arg 16, offset -16 sig0 = () system_v fn0 = colocated u0:2 sig0 block0(v5: i64 [%rbp]): [RexOp1pushq#50] x86_push v5 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 48 [RexOp1spaddr8_id#808d,%rax] v0 = stack_addr.i64 ss0 [DynRexOp2fld#410,%xmm0] v3 = load.i32x4 v0 [RexOp2fspillSib32#411,ss1] v1 = spill v3 [Op1call_id#e8] call fn0() [RexOp2ffillSib32#410,%xmm15] v4 = fill v1 [DynRexMp3r_ib_unsigned_gpr#d16,%rax] v2 = x86_pextr v4, 1 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 48 [RexOp1popq#58,%rbp] v6 = x86_pop.i64 [Op1ret#c3] return v2, v6 }The value passed to
x86_pextris stored in%xmm15, however the resulting asm expects it in%xmm7despite actually being filled to the correct%xmm15:0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 48 83 ec 30 sub rsp, 0x30 9: 48 8d 84 24 10 00 00 00 lea rax, [rsp + 0x10] 11: 0f 10 00 movups xmm0, xmmword ptr [rax] 14: 40 0f 11 84 24 00 00 00 00 movups xmmword ptr [rsp], xmm0 1d: e8 00 00 00 00 call 0x22 22: 44 0f 10 bc 24 00 00 00 00 movups xmm15, xmmword ptr [rsp] 2b: 66 41 0f 3a 16 f8 01 pextrd r8d, xmm7, 1 32: 48 83 c4 30 add rsp, 0x30 36: 40 5d pop rbp 38: c3 ret(Experimenting with SIMD support for cg_clif)
github-actions[bot] commented on Issue #1423:
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift"
<details> <summary>Users Subscribed to "cranelift"</summary>
- @bnjbvr
</details>
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
abrown commented on Issue #1423:
What version or commit of Cranelift?
bjorn3 commented on Issue #1423:
0d4bde4ab30f202c888888db7a8eb2d905c0119f
bjorn3 edited a comment on Issue #1423:
0d4bde4ab30f202c888888db7a8eb2d905c0119f (4 days ago)
bjorn3 edited a comment on Issue #1423:
0d4bde4ab30f202c888888db7a8eb2d905c0119f (from 4 days ago)
abrown commented on Issue #1423:
Hm, so
x86_pextrshould be able to infer that it needs the REX prefix: https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/codegen/meta/src/isa/x86/encodings.rs#L1761-L1778
abrown commented on Issue #1423:
Ah, the operands are flipped, right?
66 41 0f 3a 16 f8 01 pextrd r8d, xmm7, 1is in Intel syntax (I think) and should actually readpextrd xmm15, rax, 1. The REX prefix is there and being applied but the operands are flipped in the recipe or something like that.
bjorn3 commented on Issue #1423:
This is the default capstone syntax. I just ran
clif-util compile -Dp. And the output should indeed be in%rax. Didn't notice that. I was too focused on the input :)
bjorn3 commented on Issue #1423:
From the encoding:
modrm_rr(out_reg0, in_reg0, sink); // note the flipped register in the ModR/M byte
bjorn3 commented on Issue #1423:
Flipped the
rex2arguments too and the problem was fixed. Will open a PR in a moment.
abrown commented on Issue #1423:
So my comment above was incorrect:
Ah, the operands are flipped, right?
The operands shouldn't be flipped, it should still be
pextrd rax, xmm15, 1since rax is the write register and thus in the R/M slot. It's the REX bits that need to be flipped: good catch that we need to flip the operands that we pass torex2. Could you add anx86_pextrbinemit test (or let me know and I can add to that PR)? The REX coverage is thin...
abrown closed Issue #1423:
I compiled
test compile set enable_simd target x86_64-unknown-linux-gnu haswell function u0:0() -> i32 system_v { ss0 = explicit_slot 32 sig0 = () system_v fn0 = colocated u0:2 sig0 block0: v0 = stack_addr.i64 ss0 v1 = load.i32x4 v0 call fn0() v2 = extractlane.i32x4 v1, 1 return v2 }this results in the following ir after regalloc:
function u0:0(i64 fp [%rbp]) -> i32 [%rax], i64 fp [%rbp] system_v { ss0 = explicit_slot 32, offset -48 ss1 = spill_slot 16, offset -64 ss2 = incoming_arg 16, offset -16 sig0 = () system_v fn0 = colocated u0:2 sig0 block0(v5: i64 [%rbp]): [RexOp1pushq#50] x86_push v5 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 48 [RexOp1spaddr8_id#808d,%rax] v0 = stack_addr.i64 ss0 [DynRexOp2fld#410,%xmm0] v3 = load.i32x4 v0 [RexOp2fspillSib32#411,ss1] v1 = spill v3 [Op1call_id#e8] call fn0() [RexOp2ffillSib32#410,%xmm15] v4 = fill v1 [DynRexMp3r_ib_unsigned_gpr#d16,%rax] v2 = x86_pextr v4, 1 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 48 [RexOp1popq#58,%rbp] v6 = x86_pop.i64 [Op1ret#c3] return v2, v6 }The value passed to
x86_pextris stored in%xmm15, however the resulting asm expects it in%xmm7despite actually being filled to the correct%xmm15:0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 48 83 ec 30 sub rsp, 0x30 9: 48 8d 84 24 10 00 00 00 lea rax, [rsp + 0x10] 11: 0f 10 00 movups xmm0, xmmword ptr [rax] 14: 40 0f 11 84 24 00 00 00 00 movups xmmword ptr [rsp], xmm0 1d: e8 00 00 00 00 call 0x22 22: 44 0f 10 bc 24 00 00 00 00 movups xmm15, xmmword ptr [rsp] 2b: 66 41 0f 3a 16 f8 01 pextrd r8d, xmm7, 1 32: 48 83 c4 30 add rsp, 0x30 36: 40 5d pop rbp 38: c3 ret(Experimenting with SIMD support for cg_clif)
Last updated: Dec 13 2025 at 19:03 UTC