@Chris Fallin continuing from https://github.com/bytecodealliance/wasmtime/issues/2860#issuecomment-922574641, I'm not sure how well Cranelift's processor flag handling maps to my current use-case. I'm open to any thoughts if you think this is feasible, but afaik it's not quite there yet at least.
@chc4 it sounds like what you need is basically "let me use the native processor status flags and track them individually"?
i.e., something like what iflags
does (with the restriction that only one is live at a time and the clobbering tracking), but split by flag?
Essentially. How I sketched it out currently is I have a Context.flags array while I'm doing codegen, and populate with the iflags value from the iadd_ifcout for all the flags that the instruction actually sets. Then a jz
gets Context.flags[Flags::ZF] and does an br eq zf ...
, so the zf value is actual currently live iflags producer from the assembly. Notable I don't actually need discrete CF, ZF, OF, PF output from cranelift, just a bundle iflags value that I can use...but on the instruction selection side, it would probably have to be implemented as keeping track of the split flags, because CF and ZF could come from two seperate instructions, but the second one only setting ZF and keeping CF intact.
Unless it's something like adding an x86add_cout_zout_oout_pout which gives B1 for all the flags, with tracking they are from iflags ranges, I'm not sure how this could map to the B1 design.
it gets a little tricky when one starts to consider how to preserve certain flags, based on a knowledge of what every instruction clobbers, because that's platform-dependent and we'd prefer not to make the IR too platform-dependent, I think
(that matters if one also sticks to a "only one live copy of each flag" constraint, at least)
I can sort of see an alternative where we have the bundled flags value but track which values are actually live, and allow overlapping of disjoint sets
Yeah, it's difficult for an ISA agnostic IR like Cranelift, which is the root of my problem I think. The assembly I'm lifting is dependant on exact processor flags behavior, including live ranges of specific flags, which Cranelift doesn't really expose.
but at this point it's starting to get quite complex and one of the top-level goals in the new backend design, simplification of IR, etc in general is to consume an IR with easily-understandable semantics and that is able to be compiled/processed with minimal complexity; this is why we are looking at iflags with some suspicion (or at least I am)
I think the most straightforward, in some sense, is actually exactly what you said above: an add
that produces N flags outputs
on x86, that maps to exactly the add instruction
on e.g. aarch64 we may have to synthesize some values
in principle I'm not opposed to it -- something like iadd_flags
This is for an x86 meta-JIT framework so I'm actually only doing x86->Cranelift->x86, if that helps any
right, that makes sense and is a really interesting application. I can see how it's desirable to tightly control the assembly here when one knows what it came from
it might be instructive to see what Valgrind does in this case? (cc @Julian Seward , he's sometimes hanging around here too)
Yeah, one of the main problems is also that Cranelift instruction selection isn't currently the best, so if I emit e.g. the icmp_imm zf, 0
snippet I mentioned earlier and that is emitted as worse assembly than I started with in >50% of the cases, that's also a pretty big downside.
we're working on a pattern-matching framework to help us handle a lot of these cases better, but yeah, it's sometimes suboptimal :-)
re: Valgrind, it looks like it computes flags as needed -- at the platform-independent IR level it stores some metadata needed to compute rflags (all or just e.g. carry)
e.g. grep for rflags or carry in: https://sourceware.org/git/?p=valgrind.git;a=blob;f=VEX/priv/guest_amd64_toIR.c;h=c6296f3987d0a0f6e49cd3245461d3565e5c9d30;hb=HEAD
I'm not sure if "compute the flag" can lower into "nop" if it's known to already be in rflags though; more reading needed
valgrind represents flags entirely in generic IR, with no special x86-specific compute-all-flags operations, or any such -- that's way too expensive.
the OSZACP state is has a delayed representation, using 3 words; so they are only computed on demand. Also it's subject to standard dead-code elimination and to a transformation that recovers the actual condition tested (eg, unsigned <=) when there's a flag set and flag use in the same block.
my question then is basically, is this something cranelift even wants to support? and if so, how much of it would I have to write myself as a seperate set of intrinsics, or would it just inform the proposed https://github.com/bytecodealliance/wasmtime/issues/3249 design?
Possibly, yes, we can consider how to support this. I'd need to set aside a bit more time to think through the implications. You're welcome to leave a comment on that issue detailing the needs of your application -- basically a way to wire flags straight through
Hi folks! I got here as it's basically the only search result for iadd_carry
; let me know if elsewhere would be better.
I've been pondering a rustc PR to lower Rust's new fn carrying_add(self, rhs: u64, carry: bool) -> (u64, bool)
functions to iadd_carry
in the cranelift backend, since it seemed like a perfect match.
Is that a bad idea, if you're planning on removing it? It's unclear what the optimal lowering should be, otherwise -- it feels like a shame to lower to multiple iadd_cout
s and bitops to combine the output flags, as it currently does.
iadd_carry isn't implemented in cranelift right now AFAICT.
@scottmcm in principle the approach I think we want to aim for here is to take in bool
s, and produce bool
s, for carry flags; I'm advocating for removing iflags
values but not flags. So eventually we should have a single CLIF opcode that does this, IMHO.
Right now there's no implemented opcode that takes a carry-in, so you'll have to synthesize something, but hopefully that will change soon (the only reason this hasn't progressed further is because our TODO list is massive but at some point I want to spend time burning down my backlog, this issue included!).
Thanks, @Chris Fallin -- taking and returning bool
s is exactly what the (unstable) library API wants, so that sounds great. TBH I couldn't guess what the Value
s for iadd_ifcarry
would be, not that I put material looking into it.
And I completely get being busy; no worries. I'm in no hurry here.
Last updated: Nov 22 2024 at 16:03 UTC