Hello, I am implementing a JIT application and I found an issue with MemFlags's endianness field being ignored on x86_64 and aarch64. I see this has been discovered before in #3625, but no fix has been made. I would like to take a crack at fixing this, but I am not familiar with Cranelift's architecture. After a quick peek I see that the raw instructions are emitted in (for example) codegen/src/isa/aarch64/inst/emit.rs
. I guess a trivial fix would be to emit a rev
instruction there after the load, based on the endianness of the ISA and flags. However, this would mean non optimal in cases like load rx, be ptr_a; store be ptr_b, rx
(the byte swaps cancel each other out). I see a relevant bswap instruction got added in #5147, so perhaps somewhere higher level I should emit those so they can later be optimised out. Where would be the relevant place to do it, and is this the correct approach in the first place?
Moreso: on x86_64 there is an instruction movbe
which is effectively a load and bswap in one. Using it slightly lowers the instruction decode cost, but it's not part of the base instruction set. Intel has had it since Haswell, AMD since Excavator. What is the policy with using such an instruction?
For the last point you did add a new movbe target flag and enable it for haswell amd excavator. You can then check this target flag before emitting the instruction.
@noxim thanks for the interest in this; indeed it is a missing feature on aarch64/x64 at the moment. Emitting byteswap instructions immediately after loads is a perfectly reasonable first implementation; we can introduce optimizations later to rewrite swapped-store-of-swapped-load and byteswap-of-swapped-load
To expand on bjorn3's comment: For example, on x64, there's a special case for 2-lane SIMD 64-bit integer multiplies if the AVX-512 instruction set extension is available. That's implemented in the ISLE lowering rules like this:
(rule 3 (lower (has_type (and (avx512vl_enabled $true)
(avx512dq_enabled $true)
(multi_lane 64 2))
(imul x y)))
(x64_vpmullq x y))
Flags like avx512vl
are declared in cranelift/codegen/meta/src/isa/x86.rs
along with, unfortunately, several other places.
Last updated: Nov 22 2024 at 16:03 UTC