Hey,
A few questions about bitcast
and raw_bitcast
, since they don't seem to match up exactly with how they're described in the docs.
For bitcast
, the docs allow any types which are the same size - but I notice that x86_64 and s390x currently only support F32 <-> I32
and F64 <-> I64
, which aligns with the reinterpret
instructions that bitcast
is used for. Would it be best to re-define this to be stricter on types, as the current definition is unclear (given the lack of examples) as to how it works for SIMD values (eg, the verifier currently compares lane_size
, which is not suggested by the docs)? Or is this a similar case to sqmul_round_sat
, where the IR definition is more polymorphic than the backends actually support, owing to no pressing need for the greater polymorphism, and in fact an i16x8
-> i8x16
bitcast - which is rejected by the verifier currently - should theoretically be support (as described in the docs)?
As for raw_bitcast
, as far as I can tell this isn't actually "used", instead being beneficial for the register allocator: is the existing implementation of this (gen_move
on AArch64 and x86_64) of any value, or could this be considered effectively an outright no-op (which seems to be how s390x views it)?
raw_bitcast is used to convert between vectors of different lane sizes.
Sorry, I meant more as in I don't believe it actually emits any instructions (according to this issue) - it's solely used for Cranelift's internal type system, correct?
Indeed. That is the intention.
@Damian Heaton there was an extensive discussion of raw_bitcast in yesterday's Cranelift meeting following a bunch of thought in https://github.com/bytecodealliance/wasmtime/issues/4566. @Ulrich Weigand is proposing much clearer and more explicit semantics for raw_bitcast, motivated initially by endianness questions (but as a result we'll have a much clearer spec)
re: bitcast
, given that raw_bitcast
will cover "reinterpret vector across lane" cases, I can see it making sense for same-lane-width cases as a straightforward vectorization (i32x4 -> f32x4 for example) but that's it
so I think the answer is basically "we're working on it" and see above issue :-)
Last updated: Dec 23 2024 at 12:05 UTC