iximeow opened PR #1378 from windows-fprs-preservation
to master
:
- [ ] This has been discussed in issue #1366
- [ ] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [ ] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation
to master
:
- [ ] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [ ] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation
to master
:
- [ ] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation
to master
:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow created PR Review Comment:
assuming this was a typo,
RSB
doesn't make sense to me as an x86-ism and the other automatically-restored register would beRBP
.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
not checking that
GPR
(andFPR
) contain the regunit in question could introduce errors if these RegClasses had differing widths. So far as I can see, x86RegClass
all have the same width of 1, so this happens to work out okay, but I think is strictly speaking a bug independent of SIMD register preservation.
iximeow submitted PR Review.
iximeow created PR Review Comment:
csrs.iter(FPR).len()
will be zero via the above assert, but I want the stack size to be the same expression in the three places we compute it here inabi.rs
.
iximeow submitted PR Review.
iximeow created PR Review Comment:
this is where I'd like input from someone who knows codegen APIs better than me (@sstangl?): SIMD registers can't be
x86_push
'd norx86_pop
'd (there is no push/pop encoding in the ISA for them, even), so my plan is to know there will be sufficient space reserved for them and simplyload
/store
them. To do that, I would need to usersp
orrbp
as a base, with an appropriate offset. Is there a nice way to getfunction call frame base
as a value?I don't want to load
rbp
here (even if it were possible) because future frame pointer elision would break this, so eitherrsp
orfunction call frame base
seem like better choices. Computing the offset from here ought to be straightforward.
iximeow created PR Review Comment:
The
bin
expression here is wrong, but I've not gotten sufficiently far in this test to need to fix that yet. An earlier (more cumbersome) approach to preserving SIMD registers confirmed that this does get far enough to try generating stores to preserve values though, so that's a start.
iximeow submitted PR Review.
sstangl submitted PR Review.
sstangl submitted PR Review.
sstangl created PR Review Comment:
I think the IonMonkey-style approach works well: figure out how many registers to push, allocate stack by subtracting from
%rsp
, andvmovdqa
into place in an arbitrary order of your choosing.
sstangl created PR Review Comment:
I think unfortunately that this isn't quite right -- in order to store the xmm registers, the destination must be 16-byte aligned. Since the number of GPRs pushed is non-constant, you might need padding bytes.
sstangl created PR Review Comment:
Is this comment out of date? It's using
types::F64X2.bytes() as usize
now which doesn't look like double-counting!
iximeow submitted PR Review.
iximeow created PR Review Comment:
Yes, I think that would work well here too. What I don't know is how to get the value of
rsp
as a value I can provide here! Is there a way to write "give me an SSA value that is whatever is inrsp
at this point"? The other option I can imagine is another pair ofx86_*
special instructions that directly encodersp
, and take the offset as an argument.Sidenote: I see IonMonkey uses
vmovdqa
in favor ofmovdqa
- is that actually what gets emitted? Is there a reason IonMonkey prefers the AVX form?
bjorn3 created PR Review Comment:
Would it be possible to use
stack_store
?
bjorn3 submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
I'd hoped to store XMM registers before worrying about GPRs, so they would be a constant offset rather than conditionally requiring padding. I don't think that actually pans out though, and it's still variable on architecture width, so I'll adjust this to include some padding space if
csrs.iter(FPR).len() != 0
.
iximeow submitted PR Review.
iximeow created PR Review Comment:
you are correct! I realized shortly after writing the comment that double-counting is wrong for 32-bit targets, but forgot to fix the comment too.
iximeow updated PR #1378 from windows-fprs-preservation
to master
:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow created PR Review Comment:
ah! yes, I think so. If not,
stack_addr
is the piece I was missing before, and should work too.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
Unrelated, but because I noticed: do we know of anyone using
StackLimit
?total_stack_size
here undercounts the real amount of space used in this function, as it doesn't include locals.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
All usages of
ArgumentPurpose::StackLimit
on github are either from Cranelift itself, or a gecko vendoring of Cranelift.
fitzgen submitted PR Review.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
typo:
/// Get the set of callee-saved general-purpose registers.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Apologies for the drive by nitpick :)
iximeow updated PR #1378 from windows-fprs-preservation
to master
:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow submitted PR Review.
iximeow created PR Review Comment:
good eye, this function went through some revisions too :)
iximeow submitted PR Review.
iximeow created PR Review Comment:
it turns out
stack_load
andstack_store
won't do - there's no direct encoding for them in x86 (yet, I hope) andprologue_epilogue
is well pastlegalize
.
stack_addr
was surprising too: we're also pastregalloc
so I have to assign a location myself. The error not doing this (that I violated constraints) was quite confusing, because .. what constraints would copyingrsp
have? :DA third consideration I didn't realize: we're also well past
postopt
, so opportunity to merge astack_addr/{load,store}
has passed. To this end, I added only onestack_addr
in the prologue, and one in the epilogue, and only if we actually preserve any FPRs in the first place. Withstack_load
/stack_store
encodings this can be cleaned up some more, but this already took longer than I'd anticipated.Anyway, it works now!
iximeow submitted PR Review.
iximeow created PR Review Comment:
It appears that
filetests
doesn't actually testbin
expressions (anymore?). This continued expressing the wrong bytes even while the test passed.I thought I saw an incorrect
bin
in a different filetest, but wrote it off at the time. I'll file a followup issue for this.
iximeow updated PR #1378 from windows-fprs-preservation
to master
:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr
. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow submitted PR Review.
iximeow created PR Review Comment:
mislead myself,
bin
directives are tested when the file istest binemit
, but this istest compile
.
Last updated: Nov 22 2024 at 17:03 UTC