Stream: cranelift

Topic: x86_64 iflags handling


view this post on Zulip chc4 (Sep 20 2021 at 14:57):

@Chris Fallin continuing from https://github.com/bytecodealliance/wasmtime/issues/2860#issuecomment-922574641, I'm not sure how well Cranelift's processor flag handling maps to my current use-case. I'm open to any thoughts if you think this is feasible, but afaik it's not quite there yet at least.

Hi! While using Cranelift as a backend for a project, I ran into a panic compiling iadd_carry instructions on x86_64. Please let me know if there is any other information I can provide that would b...

view this post on Zulip Chris Fallin (Sep 20 2021 at 14:59):

@chc4 it sounds like what you need is basically "let me use the native processor status flags and track them individually"?

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:00):

i.e., something like what iflags does (with the restriction that only one is live at a time and the clobbering tracking), but split by flag?

view this post on Zulip chc4 (Sep 20 2021 at 15:08):

Essentially. How I sketched it out currently is I have a Context.flags array while I'm doing codegen, and populate with the iflags value from the iadd_ifcout for all the flags that the instruction actually sets. Then a jz gets Context.flags[Flags::ZF] and does an br eq zf ..., so the zf value is actual currently live iflags producer from the assembly. Notable I don't actually need discrete CF, ZF, OF, PF output from cranelift, just a bundle iflags value that I can use...but on the instruction selection side, it would probably have to be implemented as keeping track of the split flags, because CF and ZF could come from two seperate instructions, but the second one only setting ZF and keeping CF intact.

view this post on Zulip chc4 (Sep 20 2021 at 15:10):

Unless it's something like adding an x86add_cout_zout_oout_pout which gives B1 for all the flags, with tracking they are from iflags ranges, I'm not sure how this could map to the B1 design.

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:11):

it gets a little tricky when one starts to consider how to preserve certain flags, based on a knowledge of what every instruction clobbers, because that's platform-dependent and we'd prefer not to make the IR too platform-dependent, I think

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:11):

(that matters if one also sticks to a "only one live copy of each flag" constraint, at least)

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:11):

I can sort of see an alternative where we have the bundled flags value but track which values are actually live, and allow overlapping of disjoint sets

view this post on Zulip chc4 (Sep 20 2021 at 15:13):

Yeah, it's difficult for an ISA agnostic IR like Cranelift, which is the root of my problem I think. The assembly I'm lifting is dependant on exact processor flags behavior, including live ranges of specific flags, which Cranelift doesn't really expose.

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:14):

but at this point it's starting to get quite complex and one of the top-level goals in the new backend design, simplification of IR, etc in general is to consume an IR with easily-understandable semantics and that is able to be compiled/processed with minimal complexity; this is why we are looking at iflags with some suspicion (or at least I am)

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:14):

I think the most straightforward, in some sense, is actually exactly what you said above: an add that produces N flags outputs

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:15):

on x86, that maps to exactly the add instruction

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:15):

on e.g. aarch64 we may have to synthesize some values

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:15):

in principle I'm not opposed to it -- something like iadd_flags

view this post on Zulip chc4 (Sep 20 2021 at 15:15):

This is for an x86 meta-JIT framework so I'm actually only doing x86->Cranelift->x86, if that helps any

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:16):

right, that makes sense and is a really interesting application. I can see how it's desirable to tightly control the assembly here when one knows what it came from

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:17):

it might be instructive to see what Valgrind does in this case? (cc @Julian Seward , he's sometimes hanging around here too)

view this post on Zulip chc4 (Sep 20 2021 at 15:18):

Yeah, one of the main problems is also that Cranelift instruction selection isn't currently the best, so if I emit e.g. the icmp_imm zf, 0 snippet I mentioned earlier and that is emitted as worse assembly than I started with in >50% of the cases, that's also a pretty big downside.

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:20):

we're working on a pattern-matching framework to help us handle a lot of these cases better, but yeah, it's sometimes suboptimal :-)

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:21):

re: Valgrind, it looks like it computes flags as needed -- at the platform-independent IR level it stores some metadata needed to compute rflags (all or just e.g. carry)

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:21):

e.g. grep for rflags or carry in: https://sourceware.org/git/?p=valgrind.git;a=blob;f=VEX/priv/guest_amd64_toIR.c;h=c6296f3987d0a0f6e49cd3245461d3565e5c9d30;hb=HEAD

view this post on Zulip Chris Fallin (Sep 20 2021 at 15:22):

I'm not sure if "compute the flag" can lower into "nop" if it's known to already be in rflags though; more reading needed

view this post on Zulip Julian Seward (Sep 20 2021 at 15:40):

valgrind represents flags entirely in generic IR, with no special x86-specific compute-all-flags operations, or any such -- that's way too expensive.

view this post on Zulip Julian Seward (Sep 20 2021 at 15:41):

the OSZACP state is has a delayed representation, using 3 words; so they are only computed on demand. Also it's subject to standard dead-code elimination and to a transformation that recovers the actual condition tested (eg, unsigned <=) when there's a flag set and flag use in the same block.

view this post on Zulip chc4 (Sep 20 2021 at 17:36):

my question then is basically, is this something cranelift even wants to support? and if so, how much of it would I have to write myself as a seperate set of intrinsics, or would it just inform the proposed https://github.com/bytecodealliance/wasmtime/issues/3249 design?

In #3233 we discussed various simplifications we could perform to reduce the number of CLIF opcodes that exist. In general, we should strive for simplicity -- fewer opcodes mean less cost for every...

view this post on Zulip Chris Fallin (Sep 20 2021 at 18:21):

Possibly, yes, we can consider how to support this. I'd need to set aside a bit more time to think through the implications. You're welcome to leave a comment on that issue detailing the needs of your application -- basically a way to wire flags straight through

view this post on Zulip scottmcm (Nov 19 2021 at 04:48):

Hi folks! I got here as it's basically the only search result for iadd_carry; let me know if elsewhere would be better.

I've been pondering a rustc PR to lower Rust's new fn carrying_add(self, rhs: u64, carry: bool) -> (u64, bool) functions to iadd_carry in the cranelift backend, since it seemed like a perfect match.

Is that a bad idea, if you're planning on removing it? It's unclear what the optimal lowering should be, otherwise -- it feels like a shame to lower to multiple iadd_couts and bitops to combine the output flags, as it currently does.

view this post on Zulip bjorn3 (Nov 19 2021 at 17:11):

iadd_carry isn't implemented in cranelift right now AFAICT.

view this post on Zulip Chris Fallin (Nov 19 2021 at 17:24):

@scottmcm in principle the approach I think we want to aim for here is to take in bools, and produce bools, for carry flags; I'm advocating for removing iflags values but not flags. So eventually we should have a single CLIF opcode that does this, IMHO.

Right now there's no implemented opcode that takes a carry-in, so you'll have to synthesize something, but hopefully that will change soon (the only reason this hasn't progressed further is because our TODO list is massive but at some point I want to spend time burning down my backlog, this issue included!).

view this post on Zulip scottmcm (Nov 20 2021 at 03:07):

Thanks, @Chris Fallin -- taking and returning bools is exactly what the (unstable) library API wants, so that sounds great. TBH I couldn't guess what the Values for iadd_ifcarry would be, not that I put material looking into it.

And I completely get being busy; no worries. I'm in no hurry here.


Last updated: Nov 22 2024 at 16:03 UTC