chc4 commented on issue #2860:
I'm not sure if this issues should be re-opened or I should make a new one, but
iadd_carry
andiadd_ifcarry
are still definitely broken on x86_64.Example Cranelift IR that doesn't compile:
function u0:0(i64, i64, i64) -> i64 system_v { block0(v0: i64, v1: i64, v2: i64): jump block1 block1: v3 = iconst.i64 -5 v4, v5 = isub_ifbout.i64 v2, v3 v6 = iconst.i64 2 v7 = iconst.i64 -1 v8, v9 = iadd_ifcarry v6, v7, v5 return v8 }
(Ignore the weird block0)
Changing it to use
isub_bout
andiadd_carry
doesn't work either.There was a comment asking about what contexts would need this: I'm writing an x86_64 recompilation/partial application engine using Cranelift as a lifting target. That Cranelift snippet is what I am generating for
move |a: usize| { let val: usize; if Wrapping(a) + Wrapping(0x5) < Wrapping(a) { val = 1; } else { val = 2; } val }
or the compiled assembly function of
---- cmp rsi, -0x5 ---- mov eax, 0x2 ---- adc rax, -0x1 ---- ret
Notably, I don't have access to the un-optimized assembly code that is more "natural" and doesn't depend on the CF behavior. I've had several other issues with Cranelift IFLAGS (and the assumption that all ALU operations clobber all flags means this project is probably dead in the water), but this issue (or a similar one) should probably be re-opened to let future users know.
cfallin commented on issue #2860:
Hi @chc4 -- we can definitely talk about how to support this use-case; I'll re-open the issue. (As a general principle, we want Cranelift to be as generally applicable as is practical; so "here's my use-case and I can't figure out how to express it in CLIF" is always a valid issue!)
We've been discussing recently how to clean up the IR (see e.g. #3205 for discussion of booleans, which overlaps somewhat with the discussion of
iflags
values here, and bytecodealliance/rfcs#12 which is proposing to remove the old x86 backend that introduced a lot of these lower-level concepts) so these flags-related ops are sort of in a transition period. The application that most commonly used them before -- wide arithmetic with carries/borrows -- is now lowered directly, so we just have sort of skeletal support for any other use-cases that crop up, until we work out the best long-term design. The general approach that I think is best is to represent specific flags as bools and provide ops as needed, so e.g.iadd_carry
is a completely reasonable operator to have available. If you or someone else has the bandwidth to add this lowering, it's probably ~10 lines of code; I can point out the right spot. Sorry for the incomplete/in-transition nature of things here!
cfallin reopened issue #2860:
Hi! While using Cranelift as a backend for a project, I ran into a panic compiling iadd_carry instructions on x86_64. Please let me know if there is any other information I can provide that would be helpful.
.clif
Test Casefunction u0:0() fast { block0: v1465 = iconst.i32 0 v1467 = bconst.b1 false v1468 = iconst.i32 0 v1469, v1470 = iadd_carry v1465, v1468, v1467 v1476 = bint.i8 v1470 v1356 = iconst.i64 0 v1391 -> v1356 v1423 -> v1356 v1447 -> v1356 v1464 -> v1356 v1494 -> v1356 v1516 -> v1356 v1544 -> v1356 store notrap aligned v1476, v1356+274 trap user0 }
Steps to Reproduce
cargo run -p cranelift-tools -- compile test.clif --target x86_64
Expected Results
Code successfully generates without a panic.
Actual Results
Panic:
thread 'main' panicked at 'ALU+imm and ALU+carry ops should not appear here!', cranelift/codegen/src/isa/x64/lower.rs:5740:13
Versions and Environment
Cranelift version or commit: main HEAD (4a830b at time of writing)
Operating system: Linux
Architecture: x86_64
chc4 commented on issue #2860:
I'm still not sure it would fit my use-case, since I need correct implementation of all other processor flags, not just CF. Using a B1 and codegen'ing it back to iflags at instruction selection would work for CF, but I'd probably also need code motion and disjoint flag tracking on the selection backend so that e.g.
add x, y; lea w, [x+1]; jz ...
doesn't error from an invalid iflags live range. I can lift it asv0, b0 = iadd_cout.i64 x, y; v1 = iadd.i64 x, 1; zf = icmp_imm.i64 v0, 0; br eq ...
, but I don't think Cranelift is currently smart enough to treat the icmp_imm as an optional use for iflage.ZF and select the non-flags-clobbering LEA for iadds with no carry-out if possible, or move the iadd_cout below it if the LEA isn't encodable. This may be fine, but I also need e.g. OF, which I'm not sure how I would even naively emit as a branch case without a direct iflags type.If it is usable that would be great, but I assume I would either have to wait for the iflags design to be stabilized further or write optimization and selection passes myself, which might be a bit too much work for a side-project. My current thought is to switch to dynasm-rs so I can lift instructions to the same flags behavior (and re-implement my own IR and register allocation...)
chc4 edited a comment on issue #2860:
I'm still not sure it would fit my use-case, since I need correct implementation of all other processor flags, not just CF. Using a B1 and codegen'ing it back to iflags at instruction selection would work for CF, but I'd probably also need code motion and disjoint flag tracking on the selection backend so that e.g.
add x, y; lea w, [x+1]; jz ...
doesn't error from an invalid iflags live range. I can lift it asv0, b0 = iadd_cout.i64 x, y; v1 = iadd.i64 x, 1; zf = icmp_imm.i64 v0, 0; br eq ...
, but I don't think Cranelift is currently smart enough to treat the icmp_imm as an optional use for iflage.ZF and select the non-flags-clobbering LEA for iadds with no carry-out if possible, or move the iadd_cout below it if the LEA isn't encodable. This may be fine, but I also need e.g. OF, which I'm not sure how I would even naively emit as a branch case without a direct iflags type.If it is usable that would be great, but I assume I would either have to wait for the iflags design to be stabilized further or write optimization and selection passes myself, which might be a bit too much work for a side-project. My current thought is to switch to dynasm-rs so I can lift instructions to the same flags behavior (and re-implement my own IR and register allocation...)
We can continue this discussion on Zulip to avoid clogging this issue further.
cfallin commented on issue #2860:
We can continue this discussion on Zulip
For others who want to follow along: link
Last updated: Nov 22 2024 at 17:03 UTC