@Afonso Bordado your recent issue about iadd_pairwise
got me thinking, but when I was looking at perf awhile back for SIMD things and the x64 and aarch64 backends one of the critical lowerings I found was the shuffle
lowering. Basically there are some common shuffle patterns which map to single instructions on x64 and aarch64 which are far more perfomant than the fallback lowering of handling an arbitrary shuffle.
I saw your comments that what's implemented in Cranelift matches what v8 does but it might be worth going through the RISC-V instruction set to see if there are shuffle-like instructions which can be pattern matched in the same way as aarhc64 and x64. IIRC v8 represents each specialized shuffle as a distinct opcode so it may not all show up in the same place in the backend. I also don't know if RISC-V has specialized instructions myself, but I figured I'd bring this up if you were curious to look into it
Yeah, when I looked into shuffle I noticed we pattern match a lot on the other backends. I found some ideas for optimized shuffles here, those were the most relevant examples I could find when searching for shuffle.
I might also go search through LLVM since at least to me it looks like the RISCV v8 SIMD backend is fairly new and might not have the best examples yet. But thanks for the suggestion!
Last updated: Dec 23 2024 at 12:05 UTC