alexcrichton opened PR #5986 from lea
to main
:
This commit adds a rule for the lowering of
iadd
to uselea
for 32
and 64-bit addition. The theoretical benefit oflea
over theadd
instruction is that thelea
variant can emulate a 3-operand
instruction which doesn't destructively modify on of its operands.
Additionally thelea
operation can fold in other components such as
constant additions and shifts.In practice, however, if
lea
is unconditionally used instead ofiadd
it ends up losing 10% performance on a localmeshoptimizer
benchmark.
My best guess as to what's going on here is that my CPU's dedicated
units for address computation are all overloaded while the ALUs are
basically idle in a memory-intensive loop. Previously when the ALU was
used foradd
and the address units for stores/loads it in theory
pipelined things better (most of this is me shooting in the dark). To
prevent the performance loss here I've updated the lowering ofiadd
to
conditionally sometimes uselea
and sometimes useadd
depending on
how "complicated" theAmode
is. Simple ones likea + b
ora + $imm
continue to useadd
(and its subsequent hypothetical extramov
necessary into the result). More complicated ones likea + b + $imm
or
a + b << c + $imm
uselea
as it can remove the need for extra
instructions. Locally at least this fixes the performance loss relative
to unconditionally usinglea
.One note is that this adds an
OperandSize
argument to the
MInst::LoadEffectiveAddress
variant to add an encoding for 32-bit
lea
in addition to the preexisting 64-bit encoding.Additionally this PR has a prior commit which is a "no functional changes intended" update to the
Amode
computation in the x64 backend to rely less on recursion and avoid blowing the stack at compile time for very-long-chains of theiadd
instruction.
alexcrichton updated PR #5986 from lea
to main
.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
This says "higher-priority" but the actual priority given is less than the other cases. Something doesn't add up here.
fitzgen created PR review comment:
;; instruction to fold multiple operations into one. The actual determination
alexcrichton updated PR #5986 from lea
to main
.
alexcrichton created PR review comment:
With a negative priority though I think this is higher than the prior two?
(I can switch to giving all positive priority too)
alexcrichton submitted PR review.
alexcrichton requested fitzgen for a review on PR #5986.
fitzgen submitted PR review.
fitzgen merged PR #5986.
Last updated: Nov 22 2024 at 17:03 UTC