✔ semantics of `aarch64` `addp` in ISLE · cranelift

Stream: cranelift

Topic: ✔ semantics of `aarch64` `addp` in ISLE

Alexa VanHattum (Apr 12 2023 at 17:37):

I'm somewhat confused by the ISLE declaration for ARM's addp instruction, which by the ARM documentation is, "Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD and FP register and writes the scalar result into the destination SIMD and FP register."

In ISLE, I would expect this to take a single Reg argument and add each pair of adjacent lanes. But it takes two arguments:

(decl addp (Reg Reg VectorSize) Reg)
(rule (addp x y size) (vec_rrr (VecALUOp.Addp) x y size))

In most uses, rules pass the same argument to both x and y, eg:

(rule popcnt_16 (lower (has_type $I16 (popcnt x)))
      (let ((tmp Reg (mov_to_fpu x (ScalarSize.Size32)))
            (nbits Reg (vec_cnt tmp (VectorSize.Size8x8)))
            (added Reg (addp nbits nbits (VectorSize.Size8x8)))) // <- add the first 2 lanes of nbits?
        (mov_from_vec added 0 (ScalarSize.Size8))))

;; Sum the respective high half components.
 ;;   rd = |dg+ch|be+af||dg+ch|be+af|
(sum Reg (addp mul mul (VectorSize.Size32x4)))

But this use uses different args:

(rule -1 (lower (has_type ty (iadd_pairwise x y)))
      (addp x y (vector_size ty)))

Should the semantics add pairwise indices, but one from each argument (e.g., [x[0] + y[1], x[2] + y[3], ...)? Or am I missing something?

Alexa VanHattum (Apr 12 2023 at 17:43):

Ah, oops, that was the scalar doc string! The vector one should be "Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register." So there is a concatenation, then add pairwise.

Alexa VanHattum (Apr 12 2023 at 17:45):

So, in my weird pseudocode, I think it's [y[0] + y[1], ... y[n-2] + y[n-1], x[0] + x[1], ... x[n-2] + x[n-1]

Notification Bot (Apr 12 2023 at 17:45):

Alexa VanHattum has marked this topic as resolved.

Last updated: Apr 10 2025 at 05:03 UTC