bnjbvr opened PR #2277 from x64-opt
to main
.
bnjbvr edited PR #2277 from x64-opt
to main
:
(I'd like to test it a bit more before considering review, but: this already enhances the code quality a bit on some hot blocks identified with Valgrind.)
bnjbvr updated PR #2277 from x64-opt
to main
:
(I'd like to test it a bit more before considering review, but: this already enhances the code quality a bit on some hot blocks identified with Valgrind.)
bnjbvr requested cfallin for a review on PR #2277.
bnjbvr requested cfallin and julian-seward1 for a review on PR #2277.
bnjbvr has marked PR #2277 as ready for review.
cfallin submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Is this a sufficient test even for the non-const case? For example if
v1 = add.i32 xx yy
?
julian-seward1 created PR Review Comment:
I'm sure this is correct, but .. nevertheless .. it's hard to reason about the correctness of it (in particular, the expression
(cst << shift) >> shift
, given that there there is no type information here. I'm assuming thatcst
is au64
here. Could you please add some type annotation (or something else, that the compiler can check) so as to make that clear to the reader?
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
All 32-bit shifts on Intel (even signed right) will produce bits[63:32] == 0 too. Also, probably a more common case .. a 32-bit load also has this property.
julian-seward1 created PR Review Comment:
Does this require a similar change for loads?
julian-seward1 created PR Review Comment:
nano-nit: this isn't a peephole optimisation in the 1970s meaning of the phrase. It's just that you're increasing the size/complexity of the pattern that you are matching here. Also, nano-nano-nit,
x64
, notx86
, if we are reservingx86
to mean "32-bit".
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I don't think so, since a load doesn't "recover" information that would have been lost by the store. Here the store really was losing information about the value of high bits, and it was impossible to infer it back from loads.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Yes, it is an u64 (so the right shift is logical); i'll add an explicit type annotation for uext_cst and shift!
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
This optimization doesn't hold for non-const values (we'd need information about the high puts of a general i32 input value, which would require range analysis iiuc).
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Added a TODO.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Thanks! I should definitely read more 1970s' books :-)
bnjbvr updated PR #2277 from x64-opt
to main
:
(I'd like to test it a bit more before considering review, but: this already enhances the code quality a bit on some hot blocks identified with Valgrind.)
bnjbvr merged PR #2277.
Last updated: Jan 24 2025 at 00:11 UTC