cfallin edited PR #3848 from isle-bitops
to main
:
Along the way this introduces a few new utility bits to the ISLE bindings, e.g. some constructors to do integer arithmetic on constants (I'm surprised we got this far without it!). The bit-twiddling algorithms are, at least to my eye, a bit more legible in this form, but I'm happy to add more comments here if desired since we're making a pass over everything anyway.
Also updated a few stray places I had missed earlier where we can use implicit conversions (the
def_inst
removals).
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
It seems weird that these extractors (and above) are in any way related to
Type
... logically confusing, I can see the pragmatic part of it.
abrown created PR review comment:
:+1:
abrown created PR review comment:
The
CMOV
here usesBSR
's flag-setting behavior to work correctly. Just to check: we don't need to useConsumesFlags/ProducesFlags
here because these instructions are back-to-back and emitted in order? If that is the case, is a comment to that effect useful here?
abrown created PR review comment:
The comments below are pretty helpful; any high-level comments here that explain the general gist of how this is counting bits?
abrown created PR review comment:
;; Helper for emitting immediates with an `i64` value. Note that
abrown created PR review comment:
(Also, if this is allowed, then shouldn't the new
CmoveOr
andXmmCmoveOr
sequences have been written in ISLE?)
abrown created PR review comment:
;; For SSE4.2 we use Mula's algorithm (https://arxiv.org/pdf/1611.07612.pdf)
?
fitzgen submitted PR review.
fitzgen created PR review comment:
When I introduced the first of these extractors, I did it this way so that you read them near the type constraints because they are most often related to those, e.g. whether to use AVX512 is related to whether we are lowering a vector type.
/me shrugs
cfallin submitted PR review.
cfallin created PR review comment:
Hmm, yeah, I realize now that this is exactly the same circumstance that led me to suggest
CmoveOr
previously; the output of the bsr is one of the inputs to the cmove. I'm thinking now maybe we can add a new variant toConsumesFlags
for this; I'll play with it a bit, and clean up by removingCmoveOr
at the same time if I can work something out. Sorry for the inconsistency and churn!
cfallin submitted PR review.
cfallin created PR review comment:
I had copied this comment from
lower.rs
but it seems that the instructions used only require SSE 4.2 (right?) so I can update accordingly.
cfallin submitted PR review.
cfallin created PR review comment:
Yup, the advantage is that it lets us use these in a context that's already common. Though I wonder now if it could make sense to have an extractor result value (the input) that is something like a unit type: in other words, the extractor takes nothing as input, but can be used in any context. I'll think more about it; it probably muddies the waters a bit wrt clean/simple semantics; but it could work...
cfallin updated PR #3848 from isle-bitops
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
OK, so I've reworked things so that
CmoveOr
/XmmCmoveOr
no longer exist. Thanks for noticing this! Thei128
case needs a newConsumesFlags
variant that contains four consumer instructions, but this seems reasonable-ish to me.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Added comments describing how this popcount algorithm works -- it was pretty fun to work out :-)
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Does it make sense to duplicate
(imm $i64 0x7777777777777777)
at each use site, rather than reusing the named register, to reduce register pressure?
cfallin submitted PR review.
cfallin created PR review comment:
I thought about this a bit, but I think it's very likely better to load it once and reuse it: the instruction to load an arbitrary 64-bit value (
movabs
) is pretty long, the immediate can't fit in the normalsimm32
so we need a register in any case, and the live-range across 7 instructions is not really any worse than across one (any other value that wants to be live across this sequence will conflict with the register anyway, whether loaded once or loaded three times).
cfallin updated PR #3848 from isle-bitops
to main
.
fitzgen merged PR #3848.
Last updated: Dec 23 2024 at 13:07 UTC