Stream: cranelift

Topic: Illegal instruction with fcvt_to_uint


view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 14:25):

Before I dig into this, does anybody why I'm getting an illegal instruction when using fcvt_to_uint? On aarch64 I'm getting this code:

  48:   d2e83e09        mov     x9, #0x41f0000000000000         // #4751297606875873280
  4c:   9e670123        fmov    d3, x9
  50:   1e6323c0        fcmp    d30, d3
  54:   5400014a        b.ge    0x7c  // b.tcont
  58:   1e7903cf        fcvtzu  w15, d30

which looks like a range check, but the jump target is

  7c:   0000c11f        udf     #49439

view this post on Zulip Afonso Bordado (Nov 02 2023 at 14:27):

fcvt_to_uint traps if the value would overflow the integer type, i suspect that's the bounds check that you are seeing.

view this post on Zulip Afonso Bordado (Nov 02 2023 at 14:28):

We have a non trapping version in fcvt_to_uint_sat, that insted emits INT_MAX or INT_MIN

view this post on Zulip Afonso Bordado (Nov 02 2023 at 14:29):

fcvt_to_uint also traps on NaN values

view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 14:31):

what happend to the trap code? is the undefined instruction a trap instruction or did that code not get linked into the module?

view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 14:33):

does the saturating version also insert bounds checks (and just saturates instead of traps) or is saturating the native behavior? (I guess that's arch specific)

view this post on Zulip Afonso Bordado (Nov 02 2023 at 14:35):

I think udf is the trap instruction on AArch64, and i would guess #49439 to be the integer overflow trap code, but i'm not 100% sure

view this post on Zulip Alex Crichton (Nov 02 2023 at 14:36):

49439 is 0xc11f which looks like "clif"

view this post on Zulip Alex Crichton (Nov 02 2023 at 14:36):

otherwise holds no significance

view this post on Zulip Afonso Bordado (Nov 02 2023 at 14:39):

does the saturating version also insert bounds checks (and just saturates instead of traps) or is saturating the native behavior? (I guess that's arch specific)

Yeah, that's arch specific, on AArch64 we do emit some extra code for I8 and I16, but not for I32 / I64

view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 15:13):

Hm, ok, and there's no way to get a "undefined behavior but not traps" semantic? I suppose I could emit the same checks manually and jump past the instruction in case of overflow and then hope cranelift will recognize the trap checks and optimize them away...

view this post on Zulip Chris Fallin (Nov 02 2023 at 16:34):

We don't have undefined behavior in Cranelift, by design (*this is sometimes only aspirationally true, but it is a key design difference vs. e.g. LLVM)

view this post on Zulip Chris Fallin (Nov 02 2023 at 16:34):

and likewise we don't optimize away code based on traps (the reasoning chain for that typically relies on having UB and knowing that it "can't" occur, at least in compilers like LLVM)

view this post on Zulip Chris Fallin (Nov 02 2023 at 16:35):

this may seem like a strange restriction but it actually unlocks a ton of goodness in ability to fuzz/test/verify, and avoids a whole class of bugs, so we're unlikely to loosen it

view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 21:20):

Alright, thanks for all the feedback, useful and prompt as always.
Design choices make sense, but I wasn't suggesting to optimize code out base on traps but rather, if the generated code for fcvt looks like:

fcvt(value) = if (value in range) {
  native_fcvt(value)
} else {
  trap
}

I could generate

if (value in range) {
  fcvt(value) // expands to the if above
} else {
  42
}

to evalutate to 42 on overflow and when the fcvt gets inlined, the inner range check (with the trap in the else block) gets optimized out since it's already in a basic block where the value is in range.

view this post on Zulip Kristian H. Kristensen (Nov 02 2023 at 21:22):

But on closer inspection, it looks like saturating fcvt is what I want anyway, so it's all good

view this post on Zulip Alex Crichton (Nov 02 2023 at 21:24):

Currently Cranelift doesn't do this kind of analysis, e.g. keeping a known range of values and updating that within blocks after an if condition, so no there's not currently a way for Cranelift to omit the native bounds checks if they aren't actually necessary

view this post on Zulip Chris Fallin (Nov 02 2023 at 22:10):

for anyone wanting this, the keyword to track is "sparse conditional constant propagation" -- that's what would do the "x is 1 on this branch, 2 on that branch" style of cprop. We've talked about how to incorporate it but it's nontrivial, so unfortunately not yet


Last updated: Nov 22 2024 at 17:03 UTC