floating point exeption flags · cranelift

Stream: cranelift

Topic: floating point exeption flags

Kristian H. Kristensen (Aug 03 2023 at 15:01):

Any plans to provide an instruction to read out the floating point exception flags?

Kristian H. Kristensen (Aug 03 2023 at 15:53):

I'd like to get core::arch::asm!("mrs {flags}, fpsr", flags = out(reg) fpsr) (for aarc64) as a cranlift opcode, essentially

Kristian H. Kristensen (Aug 03 2023 at 16:09):

what's tricky is that even if I implement this using a call to a native function, cranelift reorders it wrt to floating point operation that I'm trying to get the flags for, because it doesn't model any side effects for them

Kristian H. Kristensen (Aug 03 2023 at 16:12):

the fence instruction doesn't - as expected - prevent the reordering... if only there was a fence_all_the_things instruction.

Chris Fallin (Aug 03 2023 at 16:14):

in principle we could design opcodes/abstractions that lower to this; but it's a bit tricky for a few reasons:

CLIF is architecture-independent and deterministic; a concept such as "aarch64 FP status flags" would need to be locked down into a specific definition ("Z flag set if ...") and we'd have the question of how to polyfill this on other architectures
We don't have implicit state that flows between opcodes, and relying on this will result in things that only work by accident, or not (as you've seen!). We represent result flow as explicit dataflow, so we'd render flags as a second output of FP instructions, if we built an abstraction for this. Something like fadd_with_flags or thereabouts

Kristian H. Kristensen (Aug 03 2023 at 16:23):

Ah, the aarch instruction was just for example, the exceptions are part of IEEE 754: https://www.gnu.org/software/libc/manual/html_node/FP-Exceptions.html - they're available on x86 and risc-v as well

Kristian H. Kristensen (Aug 03 2023 at 16:25):

But the dataflow point is fair... it would be nice if there was a way to prevent the reordering though... I assume putting it in its own basic block won't fool cranelift?

Kristian H. Kristensen (Aug 03 2023 at 16:31):

The best thing I can think of is to write all live floating point variables to a dummy memory location and then issue a fence instruction

Chris Fallin (Aug 03 2023 at 16:33):

"fooling" cranelift is going about things the wrong way here (and fence doesn't mean what you're implying it to mean above -- it's not a fence for all instructions, just for memory ops): depending implicitly on some ordering and picking up register state that just happened to be left by an earlier instruction is asking for disaster, and is invalid CLIF

Jamey Sharp (Aug 03 2023 at 16:33):

yeah, the egraph pass is pretty aggressive about ignoring which basic block a pure instruction was originally specified in. that said, if you pass the result of the floating-point op to your function call, I think it'll usually do what you want; the exception would be if some other side-effecting instruction transitively uses that same result and appears before the function call, but you can ensure that doesn't happen, especially if you have the function call return its argument and only use the value from there. basically you can simulate the two-result fadd_with_flags instruction Chris described by using a function call, except that it will be fully serialized with respect to any other side effects so this impedes optimization. it should be useful for a proof of concept though

Chris Fallin (Aug 03 2023 at 16:34):

I don't think a "real" fadd_with_flags would be much work to do, in fact probably less than the effort to try to fool the compiler and work around the issues you're finding above

Chris Fallin (Aug 03 2023 at 16:35):

basically one would need to add mrs to the assembler library (new MInst variant), fadd_with_flags to CLIF, and a lowering that takes fadd_with_flags to fadd + mrs ...

Jamey Sharp (Aug 03 2023 at 16:36):

I guess it's worth noting that multi-result instructions impede optimization today anyway and behave like side-effecting instructions, so actually I think just calling a library function is a pretty reasonable plan

Kristian H. Kristensen (Aug 03 2023 at 16:40):

No I understand fence is just memory, that's what the "as expected" implied above, but the idea is that by writing all live fp variables to a dummy memory location and then issuing a fence, that would make that memory write depend on the fp operations, which then wouldn't be reordered past the memory fence

Chris Fallin (Aug 03 2023 at 16:41):

yeah, putting both the fadd itself and the flags fetch inside a library function is the most reasonable plan if one doesn't do fadd_with_flags

Chris Fallin (Aug 03 2023 at 16:42):

re: multi-result insts and impeding opts, I... don't know if we have any opts on fadd currently? E.g. we have TODO: fadd (here) in cprop.isle :sweat_smile:

Kristian H. Kristensen (Aug 03 2023 at 16:42):

Just to be clear, I'd definitely not want to fool the compiler, and it's good to hear that it reorders as much as it does

Chris Fallin (Aug 03 2023 at 16:43):

ok, yeah, that store-then-fence scheme would work, but would likely be awfully slow, slower than a library call at least

Jamey Sharp (Aug 03 2023 at 16:46):

hang on, are you trying to check whether a single operation failed, or do you just want to know after a series of operations whether any of them failed?

Kristian H. Kristensen (Aug 03 2023 at 16:46):

I guess an fpfence instruction could be a pragmatic solution: prevents reordering of fp operations that same way fence prevents reordering or loads and stores

Chris Fallin (Aug 03 2023 at 16:47):

eh, that's a very intrusive change to the compiler though

Chris Fallin (Aug 03 2023 at 16:47):

and also breaks our "dataflow goes through values" principle (this is important for verification too, fwiw)

Jamey Sharp (Aug 03 2023 at 16:48):

if you just want to know whether any of the preceding ops failed, then all you need is a data dependency that includes all those ops

Chris Fallin (Aug 03 2023 at 16:48):

(intrusive because: we don't reason about fences at all when moving code; rather, we just never move loads/stores)

Chris Fallin (Aug 03 2023 at 16:48):

(so the fence's purpose is really to communicate to the microarchitecture)

Kristian H. Kristensen (Aug 03 2023 at 16:48):

@Jamey Sharp yeah, that what I was trying to describe with my "write all pending fp operations to memory, then fence" idea

Jamey Sharp (Aug 03 2023 at 16:51):

okay, so Chris probably won't like this but a no-op that doesn't emit any code but has the side-effect of ensuring that its operand has been computed would do the trick

Jamey Sharp (Aug 03 2023 at 16:53):

you could also do this by returning all the relevant values from a function and checking the exception status after the return. there are a bunch of ways to serialize things like this

Chris Fallin (Aug 03 2023 at 16:57):

Jamey correctly predicts my status of liking-or-not-liking this idea :-) (but I'll note, I had also suggested a "force" op for something a while back, so it's a perennially tempting tool)

If the goal is to hack together a prototype, it seems workable. For an upstreamable patch I think the multi-output-instruction idea is the one we'd most seriously consider

Kristian H. Kristensen (Aug 03 2023 at 17:05):

Ok, thanks. Good feedback, nothing unexpected, but also no silver bullet... The multi-output instruction doesn't map well to how HW actually work though and implementing C99 fetestexcept will be suboptimal.

Chris Fallin (Aug 03 2023 at 17:12):

happy to discuss further in an issue if/when you get to the point of laying out the design options!

Last updated: Apr 08 2025 at 23:03 UTC