cranelift / Issue #1192 Too many raw_bitcasts in SIMD code · git-cranelift

alexcrichton transferred Issue #1192:

What is the feature or code improvement you would like to do in Cranelift?

During translation from Wasm to CLIF, a combination of Wasm's v128 type and Cranelift's current type system forces us to add many raw_bitcast instructions between operations. For example, this Wasm code:

  (func (export "add-sub") (param v128 v128 v128) (result v128)
    (i16x8.add (i16x8.sub (local.get 0) (local.get 1))(local.get 2)))

Translates to this CLIF code:

function u0:4(i64 vmctx [%rdi], i8x16 [%xmm0], i8x16 [%xmm1], i64 fp [%rbp]) -> i8x16 [%xmm0], i64 fp [%rbp] system_v {
    ss0 = incoming_arg 16, offset -16

                                ebb0(v0: i64 [%rdi], v1: i8x16 [%xmm0], v2: i8x16 [%xmm1], v12: i64 [%rbp]):
[RexOp1pushq#50]                    x86_push v12
[RexOp1copysp#8089]                 copy_special %rsp -> %rbp
@00a6 [null_fpr#00,%xmm0]           v4 = raw_bitcast.i16x8 v1
@00a6 [Mp2vconst_optimized#5ef,%xmm2] v11 = vconst.i16x8 0x00
@00a6 [Mp2fa#5f9,%xmm2]             v5 = isub v11, v4
@00a6 [null_fpr#00,%xmm2]           v6 = raw_bitcast.i8x16 v5
@00aa [null_fpr#00,%xmm2]           v7 = raw_bitcast.i16x8 v6
@00aa [null_fpr#00,%xmm1]           v8 = raw_bitcast.i16x8 v2
@00aa [Mp2fa#5fd,%xmm2]             v9 = iadd v7, v8
@00aa [null_fpr#00,%xmm2]           v10 = raw_bitcast.i8x16 v9
@00ac [-]                           fallthrough ebb1(v10)

                                ebb1(v3: i8x16 [%xmm2]):
@00ac [Op2frmov#428]                regmove v3, %xmm2 -> %xmm0
[RexOp1popq#58,%rbp]                v13 = x86_pop.i64
@00ac [Op1ret#c3]                   return v3, v13
}

This issue is to discuss if and how to remove these extra bitcasts.

What is the value of adding this in Cranelift?

The extra raw_bitcasts emit no machine code but they are confusing when troubleshooting and add extra memory and processing overhead during compilation.

Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Some options:

add types to load and const: https://github.com/WebAssembly/simd/issues/125 was discussed in the Wasm SIMD Sync meeting (https://github.com/WebAssembly/simd/issues/121) and someone brought up that making load and const typed (e.g. f32x4.load) would allow compilers to attach the correct types to values and retain them through the less-strong v128 operations (e.g. xor). https://github.com/WebAssembly/simd/issues/125 discusses this from a performance point of view but that addition would solve this issue.
examine the DFG: another approach would be to look at the DFG to figure out the types of predecessors as mentioned in https://github.com/WebAssembly/simd/pull/1#issuecomment-295331508. This, however, would have to be extended for type signatures. Cranelift would have to look at the instructions in a function to figure out how the v128 parameters are used. In the function add-sub above, with signature (param v128 v128 v128), the addition and subtraction make this clear but some functions will make this analysis impossible.
add a V128 type to Cranelift: Cranelift's type system could be extended to include a V128 type in Cranelift's type system that would include all INxN, FNxN, and BNxN types. The instruction types would stay the same (e.g. iadd should still only accept integers) but type-checking could be relaxed to allow the V128 type to be used as one of its valid subtypes. This opens up a mechanism to get around the type-checking but arguably that already exists with raw_bitcast. Code that knows its types would remain as-is but Wasm-to-CLIF translated code could use the V128 a bit more naturally than the raw_bitcasts.
do nothing: I brought this up a long time ago when talking to @sunfishcode and that seemed the best thing to do then--I'm opening this issue to discuss whether that is still the case.

Have you considered alternative implementations? If so, how are they better or worse than your proposal?

See above.

Last updated: Apr 14 2025 at 12:05 UTC

Stream: git-cranelift

Topic: cranelift / Issue #1192 Too many raw_bitcasts in SIMD code

GitHub (Feb 28 2020 at 23:27):

What is the feature or code improvement you would like to do in Cranelift?

What is the value of adding this in Cranelift?

Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Have you considered alternative implementations? If so, how are they better or worse than your proposal?