akirilov-arm commented on Issue #1175:
Having a specialized
load_splat
Cranelift instruction would be beneficial for the AArch64 backend (so that it could generateLD1R
instructions). I tried the alternative approach quickly, i.e. smarter codegen, which in the AArch64 case means usingmaybe_input_insn()
, but it didn't work (I was gettingNone
as the return value) - the reason was probably because the load might have side effects.
abrown commented on Issue #1175:
Let's do that then; in the past, it's always been ok to create Cranelift instructions that map exactly to Wasm SIMD instructions so that seems like a good path forward. If you add it (somewhere in
meta/src/shared/instructions.rs
) feel free to tag me and I'll review it.
akirilov-arm commented on Issue #1175:
I'd be interested in having a wider discussion; in particular, @cfallin might have a different opinion with respect to the AArch64 backend.
cfallin commented on Issue #1175:
I think it's a reasonably pragmatic solution to have the dedicated instruction; it reduces work overall in the pipeline if we can carry the whole package of operations together as one unit from Wasm to machine code.
There's a possible concern if we have a lot of these; it's certainly nice to only have to worry about literal "load" and "store" instructions when analyzing loads and stores. But perhaps we get around that by having accessors on the instruction ("this inst is a load of some form, with address X and size Y"). Or, in the worst case, the instruction is a black-box with unknown side-effects from the point of view of other optimization passes, so it inhibits some optimization but is still correct.
@akirilov-arm, the reason that your (quite reasonable!) approach attempting to merge the ops in the backend didn't work is indeed because load is side-effecting, so will never be returned by
maybe_input_insn()
. This prevents the backend from arbitrarily sinking a side-effecting op, which could lead to a miscompile (consider if the load-splat were sunk past a store to an aliasing pointer, for example). The "correct" thing to do is to hoist the splat instead and then merge, but the framework isn't really set up to allow that.
cfallin closed Issue #1175:
What is the feature or code improvement you would like to do in Cranelift?
In bytecodealliance/cranelift#1347 I added a temporary lowering for Wasm's
load_splat
to two Cranelift instructions,load + splat
. This generates extra instructions that could be removed by a specialized Craneliftload_splat
instruction or by smarter codegen (e.g. complex addressing on splat).What is the value of adding this in Cranelift?
Fewer instructions produced.
Do you have an implementation plan, and/or ideas for data structures or algorithms to use?
Seeking feedback on which way to proceed: specialized
load_splat
or smarter codegen.Have you considered alternative implementations? If so, how are they better or worse than your proposal?
See above.
Last updated: Jan 24 2025 at 00:11 UTC