Any plans to provide an instruction to read out the floating point exception flags?
I'd like to get core::arch::asm!("mrs {flags}, fpsr", flags = out(reg) fpsr)
(for aarc64) as a cranlift opcode, essentially
what's tricky is that even if I implement this using a call to a native function, cranelift reorders it wrt to floating point operation that I'm trying to get the flags for, because it doesn't model any side effects for them
the fence
instruction doesn't - as expected - prevent the reordering... if only there was a fence_all_the_things
instruction.
in principle we could design opcodes/abstractions that lower to this; but it's a bit tricky for a few reasons:
fadd_with_flags
or thereaboutsAh, the aarch instruction was just for example, the exceptions are part of IEEE 754: https://www.gnu.org/software/libc/manual/html_node/FP-Exceptions.html - they're available on x86 and risc-v as well
But the dataflow point is fair... it would be nice if there was a way to prevent the reordering though... I assume putting it in its own basic block won't fool cranelift?
The best thing I can think of is to write all live floating point variables to a dummy memory location and then issue a fence
instruction
"fooling" cranelift is going about things the wrong way here (and fence
doesn't mean what you're implying it to mean above -- it's not a fence for all instructions, just for memory ops): depending implicitly on some ordering and picking up register state that just happened to be left by an earlier instruction is asking for disaster, and is invalid CLIF
yeah, the egraph pass is pretty aggressive about ignoring which basic block a pure instruction was originally specified in. that said, if you pass the result of the floating-point op to your function call, I think it'll usually do what you want; the exception would be if some other side-effecting instruction transitively uses that same result and appears before the function call, but you can ensure that doesn't happen, especially if you have the function call return its argument and only use the value from there. basically you can simulate the two-result fadd_with_flags
instruction Chris described by using a function call, except that it will be fully serialized with respect to any other side effects so this impedes optimization. it should be useful for a proof of concept though
I don't think a "real" fadd_with_flags
would be much work to do, in fact probably less than the effort to try to fool the compiler and work around the issues you're finding above
basically one would need to add mrs
to the assembler library (new MInst
variant), fadd_with_flags
to CLIF, and a lowering that takes fadd_with_flags
to fadd
+ mrs ...
I guess it's worth noting that multi-result instructions impede optimization today anyway and behave like side-effecting instructions, so actually I think just calling a library function is a pretty reasonable plan
No I understand fence is just memory, that's what the "as expected" implied above, but the idea is that by writing all live fp variables to a dummy memory location and then issuing a fence, that would make that memory write depend on the fp operations, which then wouldn't be reordered past the memory fence
yeah, putting both the fadd itself and the flags fetch inside a library function is the most reasonable plan if one doesn't do fadd_with_flags
re: multi-result insts and impeding opts, I... don't know if we have any opts on fadd
currently? E.g. we have TODO: fadd
(here) in cprop.isle
:sweat_smile:
Just to be clear, I'd definitely not want to fool the compiler, and it's good to hear that it reorders as much as it does
ok, yeah, that store-then-fence scheme would work, but would likely be awfully slow, slower than a library call at least
hang on, are you trying to check whether a single operation failed, or do you just want to know after a series of operations whether any of them failed?
I guess an fpfence
instruction could be a pragmatic solution: prevents reordering of fp operations that same way fence
prevents reordering or loads and stores
eh, that's a very intrusive change to the compiler though
and also breaks our "dataflow goes through values" principle (this is important for verification too, fwiw)
if you just want to know whether any of the preceding ops failed, then all you need is a data dependency that includes all those ops
(intrusive because: we don't reason about fences at all when moving code; rather, we just never move loads/stores)
(so the fence's purpose is really to communicate to the microarchitecture)
@Jamey Sharp yeah, that what I was trying to describe with my "write all pending fp operations to memory, then fence" idea
okay, so Chris probably won't like this but a no-op that doesn't emit any code but has the side-effect of ensuring that its operand has been computed would do the trick
you could also do this by returning all the relevant values from a function and checking the exception status after the return. there are a bunch of ways to serialize things like this
Jamey correctly predicts my status of liking-or-not-liking this idea :-) (but I'll note, I had also suggested a "force" op for something a while back, so it's a perennially tempting tool)
If the goal is to hack together a prototype, it seems workable. For an upstreamable patch I think the multi-output-instruction idea is the one we'd most seriously consider
Ok, thanks. Good feedback, nothing unexpected, but also no silver bullet... The multi-output instruction doesn't map well to how HW actually work though and implementing C99 fetestexcept
will be suboptimal.
happy to discuss further in an issue if/when you get to the point of laying out the design options!
Last updated: Nov 22 2024 at 17:03 UTC