afonso360 opened issue #6826:
:wave: Hey,
Feature
In the RISC-V backend we have a bunch of lowering rules that match for example
(iadd x (splat y))
, this is a byproduct of some opcodes reading from an F or X register as their source and using that on each operation. (Every.vx
or.vf
opcode essentially).We have a lot of these opcodes and rules. However, we also have a mid-end rule that transforms
(splat (iconst _))
into(vconst _)
(#6148), which is a reasonable thing to have and should lets us const propagate further. But in our case that causes us to materialize the entirevconst
when we could avoid it.We can do better and materialize each element into a register, and use that with the special opcodes.
Benefit
This would allow us to more broadly use
.vx
and.vf
opcodes when the splatted argument is a constant. Which would hopefully reduce the amount of constant generation instructions / memory bandwidth.This is only relevant for optimized builds which run the mid-end, on unoptimized builds we may actually currently generate better code!
Implementation
Essentially I would like to have an extractor that would match either a
splat
, or avconst
that repeats the elements. This is very similar to #6527 but slightly more applicable since it works for.vx
and.vf
instead of only.vi
opcodes.One of the issues here is that we actually still have to materialize the constant into a X or F register using the regular materialization rules. But extractors should be pure and not have side effects (AFAIK).
I'm not sure if we are allowed to have extractors that have side effects when they are guaranteed to match? That would be the best in terms of ergonomics, since we could use it like any other extractor and pretend that the source X or F register was always there.
Otherwise I'm not entirely sure, we could split it into an extractor that just matches when this is possible and a separate rule that actually materializes the register. This would definitely work but would just have slightly worse ergonomics. (However it would be less surprising, which is a big positive)
Alternatives
I don't think removing the mid end rules is a very good idea. I don't think we do any other const propagation on
vconst
yet, but it would be nice not to regress that.We also don't have to do this, most likely this will save one
vmv.v.x
instruction that would do the splat which may or may not be a big deal depending on the exact uArch. In the best case this prevents us from loading 16 bytes instead loading only 8 bytes, which is a slight improvement. (4 or fewer byte constants should always be materialized with 2 instructions and would fall intovmv.v.x
territory)
afonso360 added the cranelift label to Issue #6826.
afonso360 added the cranelift:area:riscv64 label to Issue #6826.
Last updated: Nov 22 2024 at 16:03 UTC