alexcrichton transferred Issue #1192:
What is the feature or code improvement you would like to do in Cranelift?
During translation from Wasm to CLIF, a combination of Wasm's
v128
type and Cranelift's current type system forces us to add manyraw_bitcast
instructions between operations. For example, this Wasm code:(func (export "add-sub") (param v128 v128 v128) (result v128) (i16x8.add (i16x8.sub (local.get 0) (local.get 1))(local.get 2)))Translates to this CLIF code:
function u0:4(i64 vmctx [%rdi], i8x16 [%xmm0], i8x16 [%xmm1], i64 fp [%rbp]) -> i8x16 [%xmm0], i64 fp [%rbp] system_v { ss0 = incoming_arg 16, offset -16 ebb0(v0: i64 [%rdi], v1: i8x16 [%xmm0], v2: i8x16 [%xmm1], v12: i64 [%rbp]): [RexOp1pushq#50] x86_push v12 [RexOp1copysp#8089] copy_special %rsp -> %rbp @00a6 [null_fpr#00,%xmm0] v4 = raw_bitcast.i16x8 v1 @00a6 [Mp2vconst_optimized#5ef,%xmm2] v11 = vconst.i16x8 0x00 @00a6 [Mp2fa#5f9,%xmm2] v5 = isub v11, v4 @00a6 [null_fpr#00,%xmm2] v6 = raw_bitcast.i8x16 v5 @00aa [null_fpr#00,%xmm2] v7 = raw_bitcast.i16x8 v6 @00aa [null_fpr#00,%xmm1] v8 = raw_bitcast.i16x8 v2 @00aa [Mp2fa#5fd,%xmm2] v9 = iadd v7, v8 @00aa [null_fpr#00,%xmm2] v10 = raw_bitcast.i8x16 v9 @00ac [-] fallthrough ebb1(v10) ebb1(v3: i8x16 [%xmm2]): @00ac [Op2frmov#428] regmove v3, %xmm2 -> %xmm0 [RexOp1popq#58,%rbp] v13 = x86_pop.i64 @00ac [Op1ret#c3] return v3, v13 }This issue is to discuss if and how to remove these extra bitcasts.
What is the value of adding this in Cranelift?
The extra
raw_bitcasts
emit no machine code but they are confusing when troubleshooting and add extra memory and processing overhead during compilation.Do you have an implementation plan, and/or ideas for data structures or algorithms to use?
Some options:
add types to
load
andconst
: https://github.com/WebAssembly/simd/issues/125 was discussed in the Wasm SIMD Sync meeting (https://github.com/WebAssembly/simd/issues/121) and someone brought up that makingload
andconst
typed (e.g.f32x4.load
) would allow compilers to attach the correct types to values and retain them through the less-strongv128
operations (e.g.xor
). https://github.com/WebAssembly/simd/issues/125 discusses this from a performance point of view but that addition would solve this issue.examine the DFG: another approach would be to look at the DFG to figure out the types of predecessors as mentioned in https://github.com/WebAssembly/simd/pull/1#issuecomment-295331508. This, however, would have to be extended for type signatures. Cranelift would have to look at the instructions in a function to figure out how the
v128
parameters are used. In the functionadd-sub
above, with signature(param v128 v128 v128)
, the addition and subtraction make this clear but some functions will make this analysis impossible.add a
V128
type to Cranelift: Cranelift's type system could be extended to include aV128
type in Cranelift's type system that would include allINxN
,FNxN
, andBNxN
types. The instruction types would stay the same (e.g.iadd
should still only accept integers) but type-checking could be relaxed to allow theV128
type to be used as one of its valid subtypes. This opens up a mechanism to get around the type-checking but arguably that already exists withraw_bitcast
. Code that knows its types would remain as-is but Wasm-to-CLIF translated code could use theV128
a bit more naturally than theraw_bitcast
s.do nothing: I brought this up a long time ago when talking to @sunfishcode and that seemed the best thing to do then--I'm opening this issue to discuss whether that is still the case.
Have you considered alternative implementations? If so, how are they better or worse than your proposal?
See above.
Last updated: Jan 24 2025 at 00:11 UTC