I'm somewhat confused by the ISLE declaration for ARM's addp
instruction, which by the ARM documentation is, "Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD and FP register and writes the scalar result into the destination SIMD and FP register."
In ISLE, I would expect this to take a single Reg
argument and add each pair of adjacent lanes. But it takes two arguments:
(decl addp (Reg Reg VectorSize) Reg)
(rule (addp x y size) (vec_rrr (VecALUOp.Addp) x y size))
In most uses, rules pass the same argument to both x
and y
, eg:
(rule popcnt_16 (lower (has_type $I16 (popcnt x)))
(let ((tmp Reg (mov_to_fpu x (ScalarSize.Size32)))
(nbits Reg (vec_cnt tmp (VectorSize.Size8x8)))
(added Reg (addp nbits nbits (VectorSize.Size8x8)))) // <- add the first 2 lanes of nbits?
(mov_from_vec added 0 (ScalarSize.Size8))))
;; Sum the respective high half components.
;; rd = |dg+ch|be+af||dg+ch|be+af|
(sum Reg (addp mul mul (VectorSize.Size32x4)))
But this use uses different args:
(rule -1 (lower (has_type ty (iadd_pairwise x y)))
(addp x y (vector_size ty)))
Should the semantics add pairwise indices, but one from each argument (e.g., [x[0] + y[1], x[2] + y[3]
, ...)? Or am I missing something?
Ah, oops, that was the scalar doc string! The vector one should be "Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register." So there is a concatenation, then add pairwise.
So, in my weird pseudocode, I think it's [y[0] + y[1], ... y[n-2] + y[n-1], x[0] + x[1], ... x[n-2] + x[n-1]
Alexa VanHattum has marked this topic as resolved.
Last updated: Nov 22 2024 at 17:03 UTC