Before I dig into this, does anybody why I'm getting an illegal instruction when using fcvt_to_uint? On aarch64 I'm getting this code:
48: d2e83e09 mov x9, #0x41f0000000000000 // #4751297606875873280
4c: 9e670123 fmov d3, x9
50: 1e6323c0 fcmp d30, d3
54: 5400014a b.ge 0x7c // b.tcont
58: 1e7903cf fcvtzu w15, d30
which looks like a range check, but the jump target is
7c: 0000c11f udf #49439
fcvt_to_uint
traps if the value would overflow the integer type, i suspect that's the bounds check that you are seeing.
We have a non trapping version in fcvt_to_uint_sat
, that insted emits INT_MAX or INT_MIN
fcvt_to_uint
also traps on NaN values
what happend to the trap code? is the undefined instruction a trap instruction or did that code not get linked into the module?
does the saturating version also insert bounds checks (and just saturates instead of traps) or is saturating the native behavior? (I guess that's arch specific)
I think udf is the trap instruction on AArch64, and i would guess #49439 to be the integer overflow trap code, but i'm not 100% sure
49439 is 0xc11f
which looks like "clif"
otherwise holds no significance
does the saturating version also insert bounds checks (and just saturates instead of traps) or is saturating the native behavior? (I guess that's arch specific)
Yeah, that's arch specific, on AArch64 we do emit some extra code for I8 and I16, but not for I32 / I64
Hm, ok, and there's no way to get a "undefined behavior but not traps" semantic? I suppose I could emit the same checks manually and jump past the instruction in case of overflow and then hope cranelift will recognize the trap checks and optimize them away...
We don't have undefined behavior in Cranelift, by design (*this is sometimes only aspirationally true, but it is a key design difference vs. e.g. LLVM)
and likewise we don't optimize away code based on traps (the reasoning chain for that typically relies on having UB and knowing that it "can't" occur, at least in compilers like LLVM)
this may seem like a strange restriction but it actually unlocks a ton of goodness in ability to fuzz/test/verify, and avoids a whole class of bugs, so we're unlikely to loosen it
Alright, thanks for all the feedback, useful and prompt as always.
Design choices make sense, but I wasn't suggesting to optimize code out base on traps but rather, if the generated code for fcvt looks like:
fcvt(value) = if (value in range) {
native_fcvt(value)
} else {
trap
}
I could generate
if (value in range) {
fcvt(value) // expands to the if above
} else {
42
}
to evalutate to 42 on overflow and when the fcvt gets inlined, the inner range check (with the trap in the else block) gets optimized out since it's already in a basic block where the value is in range.
But on closer inspection, it looks like saturating fcvt is what I want anyway, so it's all good
Currently Cranelift doesn't do this kind of analysis, e.g. keeping a known range of values and updating that within blocks after an if
condition, so no there's not currently a way for Cranelift to omit the native bounds checks if they aren't actually necessary
for anyone wanting this, the keyword to track is "sparse conditional constant propagation" -- that's what would do the "x is 1 on this branch, 2 on that branch" style of cprop. We've talked about how to incorporate it but it's nontrivial, so unfortunately not yet
Last updated: Nov 22 2024 at 17:03 UTC