afonso360 opened PR #5118 from fix-bmask
to main
:
:wave: Hey,
The fuzzer pointed this out when I added
bmask
, it turns out we are not correctly masking off high bits in ourbmask
implementations. This is not triggered by our current tests since we always either use aiconst
or pass the value in via a register argument, which never has "dirty" high bits.The changes on AArch64 are fairly minimal and they only mostly impact
i8
andi16
.However for RISC-V masking off 32 bits ends up generating a lot of code. I haven't checked but I think on RISC-V this also impacts
brz
andbrnz
since they also usenormalize_value
.cc: @elliottt
afonso360 created PR review comment:
I'm not too familiar with RISCV, is there a better way of masking off the top 32bits? This ends up generating a lot of code, which is unfortunate.
afonso360 submitted PR review.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 updated PR #5118 from fix-bmask
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
For reference, this generates the following assembly:
; auipc t2,0 ; ld t2,12(t2) ; j 12 ; .8byte 0xffffffff ; and a4,a0,t2
The only other solution I can think of right now would be to shift left 32 and then shift right 32, which would only be 2 instructions.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 updated PR #5118 from fix-bmask
to main
.
afonso360 edited PR review comment.
elliottt submitted PR review.
elliottt created PR review comment:
Playing around with clang's output on godbolt shows that it does this conversion with
addiw rd, rs, 0
(or thesext.w
pseudo instruction). That will perform a 32-bit add of0
tors
, and then sign-extend the value to 64-bits. As we do appear to be usingsrliw
for a 32-bit right shift, I think this should work?
afonso360 submitted PR review.
afonso360 created PR review comment:
For
normalize_value
I think that is not the correct answer since it always guarantees that the top 32 bits will be clear, which may not always be the case forsext.w
. However I think we can use that forbmask
/brz
/brnz
and we get back to a 1 instruction solution which is nice!
elliottt submitted PR review.
elliottt created PR review comment:
normalize_value
might be overly conservative as it is now. Switching toaddiw rd, rs, 0
for the 32-bit case didn't produce any test failures for me locally, and I suspect that we don't even need that instruction.
afonso360 updated PR #5118 from fix-bmask
to main
.
afonso360 created PR review comment:
I've updated it to use
addiw
and changed the meaning ofnormalize_value
so that it fits the new implementation. Thanks!I suspect that we don't even need that instruction.
That would be even better, if you have any ideas about this let me know!
afonso360 submitted PR review.
afonso360 edited PR review comment.
elliottt submitted PR review.
elliottt submitted PR review.
elliottt created PR review comment:
It would probably be worth making this rule priority 0, and the one that handles the
<= 16
case higher than the others; we'll test the arguments left-to-right, andfits_in_64
will be true for the 16-bit and smaller cases.
elliottt created PR review comment:
Thanks for updating this comment!
afonso360 updated PR #5118 from fix-bmask
to main
.
afonso360 updated PR #5118 from fix-bmask
to main
.
elliottt merged PR #5118.
Last updated: Dec 23 2024 at 12:05 UTC