Stream: git-cranelift

Topic: cranelift / PR #1378 initial brush at preserving SIMD reg...


view this post on Zulip GitHub (Feb 06 2020 at 06:10):

iximeow opened PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 06:10):

iximeow edited PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 06:10):

iximeow edited PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 06:10):

iximeow edited PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 06:11):

iximeow created PR Review Comment:

assuming this was a typo, RSB doesn't make sense to me as an x86-ism and the other automatically-restored register would be RBP.

view this post on Zulip GitHub (Feb 06 2020 at 06:11):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 06:13):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 06:13):

iximeow created PR Review Comment:

not checking that GPR (and FPR) contain the regunit in question could introduce errors if these RegClasses had differing widths. So far as I can see, x86 RegClass all have the same width of 1, so this happens to work out okay, but I think is strictly speaking a bug independent of SIMD register preservation.

view this post on Zulip GitHub (Feb 06 2020 at 06:14):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 06:14):

iximeow created PR Review Comment:

csrs.iter(FPR).len() will be zero via the above assert, but I want the stack size to be the same expression in the three places we compute it here in abi.rs.

view this post on Zulip GitHub (Feb 06 2020 at 06:18):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 06:18):

iximeow created PR Review Comment:

this is where I'd like input from someone who knows codegen APIs better than me (@sstangl?): SIMD registers can't be x86_push'd nor x86_pop'd (there is no push/pop encoding in the ISA for them, even), so my plan is to know there will be sufficient space reserved for them and simply load/store them. To do that, I would need to use rsp or rbp as a base, with an appropriate offset. Is there a nice way to get function call frame base as a value?

I don't want to load rbp here (even if it were possible) because future frame pointer elision would break this, so either rsp or function call frame base seem like better choices. Computing the offset from here ought to be straightforward.

view this post on Zulip GitHub (Feb 06 2020 at 06:20):

iximeow created PR Review Comment:

The bin expression here is wrong, but I've not gotten sufficiently far in this test to need to fix that yet. An earlier (more cumbersome) approach to preserving SIMD registers confirmed that this does get far enough to try generating stores to preserve values though, so that's a start.

view this post on Zulip GitHub (Feb 06 2020 at 06:20):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 15:28):

sstangl submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 15:28):

sstangl submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 15:28):

sstangl created PR Review Comment:

I think the IonMonkey-style approach works well: figure out how many registers to push, allocate stack by subtracting from %rsp, and vmovdqa into place in an arbitrary order of your choosing.

view this post on Zulip GitHub (Feb 06 2020 at 15:28):

sstangl created PR Review Comment:

I think unfortunately that this isn't quite right -- in order to store the xmm registers, the destination must be 16-byte aligned. Since the number of GPRs pushed is non-constant, you might need padding bytes.

view this post on Zulip GitHub (Feb 06 2020 at 15:28):

sstangl created PR Review Comment:

Is this comment out of date? It's using types::F64X2.bytes() as usize now which doesn't look like double-counting!

view this post on Zulip GitHub (Feb 06 2020 at 18:32):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 18:32):

iximeow created PR Review Comment:

Yes, I think that would work well here too. What I don't know is how to get the value of rsp as a value I can provide here! Is there a way to write "give me an SSA value that is whatever is in rsp at this point"? The other option I can imagine is another pair of x86_* special instructions that directly encode rsp, and take the offset as an argument.

Sidenote: I see IonMonkey uses vmovdqa in favor of movdqa - is that actually what gets emitted? Is there a reason IonMonkey prefers the AVX form?

view this post on Zulip GitHub (Feb 06 2020 at 18:40):

bjorn3 created PR Review Comment:

Would it be possible to use stack_store?

view this post on Zulip GitHub (Feb 06 2020 at 18:40):

bjorn3 submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 19:01):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 19:01):

iximeow created PR Review Comment:

I'd hoped to store XMM registers before worrying about GPRs, so they would be a constant offset rather than conditionally requiring padding. I don't think that actually pans out though, and it's still variable on architecture width, so I'll adjust this to include some padding space if csrs.iter(FPR).len() != 0.

view this post on Zulip GitHub (Feb 06 2020 at 19:02):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 19:02):

iximeow created PR Review Comment:

you are correct! I realized shortly after writing the comment that double-counting is wrong for 32-bit targets, but forgot to fix the comment too.

view this post on Zulip GitHub (Feb 06 2020 at 19:14):

iximeow updated PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 19:19):

iximeow created PR Review Comment:

ah! yes, I think so. If not, stack_addr is the piece I was missing before, and should work too.

view this post on Zulip GitHub (Feb 06 2020 at 19:19):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 19:22):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 19:22):

iximeow created PR Review Comment:

Unrelated, but because I noticed: do we know of anyone using StackLimit? total_stack_size here undercounts the real amount of space used in this function, as it doesn't include locals.

view this post on Zulip GitHub (Feb 06 2020 at 20:26):

bjorn3 submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 20:26):

bjorn3 created PR Review Comment:

All usages of ArgumentPurpose::StackLimit on github are either from Cranelift itself, or a gecko vendoring of Cranelift.

view this post on Zulip GitHub (Feb 06 2020 at 21:19):

fitzgen submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 21:19):

fitzgen submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 21:19):

fitzgen created PR Review Comment:

typo:

/// Get the set of callee-saved general-purpose registers.

view this post on Zulip GitHub (Feb 06 2020 at 21:20):

fitzgen submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 21:20):

fitzgen created PR Review Comment:

Apologies for the drive by nitpick :)

view this post on Zulip GitHub (Feb 06 2020 at 22:17):

iximeow updated PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 22:17):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 22:17):

iximeow created PR Review Comment:

good eye, this function went through some revisions too :)

view this post on Zulip GitHub (Feb 06 2020 at 22:22):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 22:22):

iximeow created PR Review Comment:

it turns out stack_load and stack_store won't do - there's no direct encoding for them in x86 (yet, I hope) and prologue_epilogue is well past legalize.

stack_addr was surprising too: we're also past regalloc so I have to assign a location myself. The error not doing this (that I violated constraints) was quite confusing, because .. what constraints would copying rsp have? :D

A third consideration I didn't realize: we're also well past postopt, so opportunity to merge a stack_addr/{load,store} has passed. To this end, I added only one stack_addr in the prologue, and one in the epilogue, and only if we actually preserve any FPRs in the first place. With stack_load/stack_store encodings this can be cleaned up some more, but this already took longer than I'd anticipated.

Anyway, it works now!

view this post on Zulip GitHub (Feb 06 2020 at 22:23):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 22:23):

iximeow created PR Review Comment:

It appears that filetests doesn't actually test bin expressions (anymore?). This continued expressing the wrong bytes even while the test passed.

I thought I saw an incorrect bin in a different filetest, but wrote it off at the time. I'll file a followup issue for this.

view this post on Zulip GitHub (Feb 06 2020 at 22:24):

iximeow updated PR #1378 from windows-fprs-preservation to master:

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

view this post on Zulip GitHub (Feb 06 2020 at 22:30):

iximeow submitted PR Review.

view this post on Zulip GitHub (Feb 06 2020 at 22:30):

iximeow created PR Review Comment:

mislead myself, bin directives are tested when the file is test binemit, but this is test compile.


Last updated: Dec 23 2024 at 14:03 UTC