iximeow opened PR #1378 from windows-fprs-preservation to master:
- [ ] This has been discussed in issue #1366
- [ ] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [ ] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation to master:
- [ ] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [ ] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation to master:
- [ ] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow edited PR #1378 from windows-fprs-preservation to master:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow created PR Review Comment:
assuming this was a typo,
RSBdoesn't make sense to me as an x86-ism and the other automatically-restored register would beRBP.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
not checking that
GPR(andFPR) contain the regunit in question could introduce errors if these RegClasses had differing widths. So far as I can see, x86RegClassall have the same width of 1, so this happens to work out okay, but I think is strictly speaking a bug independent of SIMD register preservation.
iximeow submitted PR Review.
iximeow created PR Review Comment:
csrs.iter(FPR).len()will be zero via the above assert, but I want the stack size to be the same expression in the three places we compute it here inabi.rs.
iximeow submitted PR Review.
iximeow created PR Review Comment:
this is where I'd like input from someone who knows codegen APIs better than me (@sstangl?): SIMD registers can't be
x86_push'd norx86_pop'd (there is no push/pop encoding in the ISA for them, even), so my plan is to know there will be sufficient space reserved for them and simplyload/storethem. To do that, I would need to usersporrbpas a base, with an appropriate offset. Is there a nice way to getfunction call frame baseas a value?I don't want to load
rbphere (even if it were possible) because future frame pointer elision would break this, so eitherrsporfunction call frame baseseem like better choices. Computing the offset from here ought to be straightforward.
iximeow created PR Review Comment:
The
binexpression here is wrong, but I've not gotten sufficiently far in this test to need to fix that yet. An earlier (more cumbersome) approach to preserving SIMD registers confirmed that this does get far enough to try generating stores to preserve values though, so that's a start.
iximeow submitted PR Review.
sstangl submitted PR Review.
sstangl submitted PR Review.
sstangl created PR Review Comment:
I think the IonMonkey-style approach works well: figure out how many registers to push, allocate stack by subtracting from
%rsp, andvmovdqainto place in an arbitrary order of your choosing.
sstangl created PR Review Comment:
I think unfortunately that this isn't quite right -- in order to store the xmm registers, the destination must be 16-byte aligned. Since the number of GPRs pushed is non-constant, you might need padding bytes.
sstangl created PR Review Comment:
Is this comment out of date? It's using
types::F64X2.bytes() as usizenow which doesn't look like double-counting!
iximeow submitted PR Review.
iximeow created PR Review Comment:
Yes, I think that would work well here too. What I don't know is how to get the value of
rspas a value I can provide here! Is there a way to write "give me an SSA value that is whatever is inrspat this point"? The other option I can imagine is another pair ofx86_*special instructions that directly encodersp, and take the offset as an argument.Sidenote: I see IonMonkey uses
vmovdqain favor ofmovdqa- is that actually what gets emitted? Is there a reason IonMonkey prefers the AVX form?
bjorn3 created PR Review Comment:
Would it be possible to use
stack_store?
bjorn3 submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
I'd hoped to store XMM registers before worrying about GPRs, so they would be a constant offset rather than conditionally requiring padding. I don't think that actually pans out though, and it's still variable on architecture width, so I'll adjust this to include some padding space if
csrs.iter(FPR).len() != 0.
iximeow submitted PR Review.
iximeow created PR Review Comment:
you are correct! I realized shortly after writing the comment that double-counting is wrong for 32-bit targets, but forgot to fix the comment too.
iximeow updated PR #1378 from windows-fprs-preservation to master:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow created PR Review Comment:
ah! yes, I think so. If not,
stack_addris the piece I was missing before, and should work too.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
Unrelated, but because I noticed: do we know of anyone using
StackLimit?total_stack_sizehere undercounts the real amount of space used in this function, as it doesn't include locals.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
All usages of
ArgumentPurpose::StackLimiton github are either from Cranelift itself, or a gecko vendoring of Cranelift.
fitzgen submitted PR Review.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
typo:
/// Get the set of callee-saved general-purpose registers.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Apologies for the drive by nitpick :)
iximeow updated PR #1378 from windows-fprs-preservation to master:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow submitted PR Review.
iximeow created PR Review Comment:
good eye, this function went through some revisions too :)
iximeow submitted PR Review.
iximeow created PR Review Comment:
it turns out
stack_loadandstack_storewon't do - there's no direct encoding for them in x86 (yet, I hope) andprologue_epilogueis well pastlegalize.
stack_addrwas surprising too: we're also pastregallocso I have to assign a location myself. The error not doing this (that I violated constraints) was quite confusing, because .. what constraints would copyingrsphave? :DA third consideration I didn't realize: we're also well past
postopt, so opportunity to merge astack_addr/{load,store}has passed. To this end, I added only onestack_addrin the prologue, and one in the epilogue, and only if we actually preserve any FPRs in the first place. Withstack_load/stack_storeencodings this can be cleaned up some more, but this already took longer than I'd anticipated.Anyway, it works now!
iximeow submitted PR Review.
iximeow created PR Review Comment:
It appears that
filetestsdoesn't actually testbinexpressions (anymore?). This continued expressing the wrong bytes even while the test passed.I thought I saw an incorrect
binin a different filetest, but wrote it off at the time. I'll file a followup issue for this.
iximeow updated PR #1378 from windows-fprs-preservation to master:
- [x] This has been discussed in issue #1366
- [x] A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
- [x] This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.
@sstangl if you get a chance, I think you might know the right things for the questions to follow!
iximeow submitted PR Review.
iximeow created PR Review Comment:
mislead myself,
bindirectives are tested when the file istest binemit, but this istest compile.
Last updated: Dec 06 2025 at 05:03 UTC