Stream: cranelift

Topic: How to deal with lack of narrow SIMD support?


view this post on Zulip vx (Jan 01 2026 at 23:39):

Hi! I've recently hit some unimplemented SIMD operations, such as fmul with f32x2 on x64 (should be implemented in ISLE), and I'm wondering how exactly I should deal with it.

I thought about extending my value to a f32x4, but I don't see any instruction that would allow me to easily do that except, maybe, insertlane and extractlane, but I'm worried this would hurt codegen & performance. Is this really the way to go about it?

I've also considered implementing these lowerings, but I've never contributed to cranelift and don't know how much work that would be. If this is an approachable issue, I'd happily contribute.

view this post on Zulip Chris Fallin (Jan 02 2026 at 16:13):

Hi @vx -- those are indeed the two general approaches (widen to 128 bit or fill out the 64-bit lowerings in Cranelift). The former is probably actually pretty reasonable performance-wise as long as you can keep data in registers most of the time -- the widening/narrowing will happen whenever you store/load from your native narrow-vector format only, and the nature of SIMD is that all lanes go independently at the same time so the extra lanes can just be "don't care" bits and won't slow things down.

Ideally we'd also fill out narrow-SIMD lowerings, but that's a big project. IIRC the 64-bit SIMD stuff was originally put in place for aarch64 because it supports these variants, but other ISAs may not, so we effectively need the equivalent of widening/narrowing at the instruction selection level (at least where interfacing with memory via loads/stores) anyway.


Last updated: Jan 09 2026 at 13:15 UTC