ghostway0 opened PR #9214 from ghostway0:ghostway/9186
to bytecodealliance:main
:
currently a draft and only riscv64
ghostway0 updated PR #9214.
ghostway0 updated PR #9214.
ghostway0 updated PR #9214.
ghostway0 commented on PR #9214:
@bjorn3 can you take a look? also, I haven't found umul equivalent (or mul with zext*) in s390x. do you know what are their names?
ghostway0 updated PR #9214.
uweigand commented on PR #9214:
Hi @ghostway0, a few comments on the s390x part:
- All the new instruction rules you added seem to provide only a single return, the overflow bit. However, my understanding is that
smul_overflow
and all the other overflow instructions are defined to have two returns, the low-part result and the overflow bit. I think you'll need to use some form ofwith_flags
to construct the pair of results (like x86 and aarch64 already do).- You're simply re-using the same instructions used for the "normal" operation (add/sub/mul) also for the overflow operation. That is correct for 32-bit and 64-bit operations, but not for 8-bit and 16-bit operations. The reason is that the 390x ISA does not actually have any 8-bit or 16-bit arithmetic instructions, so we simply use the 32-bit version also for 8-bit and 16-bit operations. That provides the correct (low-part) result, but any overflow indication would be incorrect.
- There is no unsigned-multiply instruction with overflow indication on our platform. What other compilers do is to use the 32x32->64 or 64x64->128 bit wide multiply instruction, and check whether the high-part of the output is zero.
afonso360 created PR review comment:
Ditto here
afonso360 created PR review comment:
It might be better to use
rv_sltu
here instead of a select between one and zero. The RISC-V comparision functions already return a zero or one, and they are a lot shorter than our current implementation ofselect_xreg
afonso360 created PR review comment:
It might be better to use
rv_snez
here instead of a select between one and zero.
afonso360 created PR review comment:
Same here
afonso360 submitted PR review:
:wave: Hey,
I don't know if this is ready for review yet, but It's a great start!
A few comments for the RISC-V part. I didn't check the lowerings in a lot of detail, mostly just spotting a few things that could be shorter.
Thanks for working on this!
afonso360 created PR review comment:
Ditto here for
sltu
.
afonso360 created PR review comment:
This
one
doesn't seem to be used anywhere, similarly in the rules below there are a fewone
unused instructions.
afonso360 created PR review comment:
This
one
doesn't seem to be used anywhere.
afonso360 created PR review comment:
This could also be a
snez
afonso360 created PR review comment:
This could also be replaced with a
sltu
instruction which is a shorter sequence than a full select.
afonso360 created PR review comment:
Instead of doing madd here, we can just multiply
x_lo
andy_lo
and save one instruction.
afonso360 created PR review comment:
We could replace this with a
snez
instruction
afonso360 created PR review comment:
Here we could use
mul
instead ofmadd
and save one instruction.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 submitted PR review:
:wave: Hey,
I don't know if this is ready for review yet, but It's a great start!
A few comments for the RISC-V part. I didn't check the lowerings in a lot of detail, mostly just spotting a few things that could be shorter.
Thanks for working on this!
Edit: I also ran the fuzzer (with these changes) and it pointed out this testcase.
<details>
<summary>Fuzzer testcase</summary>
Testcase:
test interpret test run target riscv64gc target x86_64 function %a(i8) -> i8 { block0(v0: i8): v1, v2 = smul_overflow v0, v0 return v2 } ; run: %a(-15) == 1
Result:
ERROR cranelift_filetests::concurrent > FAIL: run FAIL ./test.clif: run Caused by: Failed test: run: %a(-15) == 1, actual: 0 1 tests Error: 1 failure
</summary>
Last updated: Jan 24 2025 at 00:11 UTC