cfallin opened Issue #2254:
It arose in #2228 that our current implementation of the standard (SysV / AAPCS) calling convention on AArch64 is not quite right. In the real calling convention, the lower 64-bit halves of some vector registers are callee-save, while the upper 64-bit halves are caller-save. We currently treat them as fully callee-save. This is correct (if conservative) w.r.t. clobber-saves in prologue/epilogue code, but is incorrect w.r.t. callsites.
The most correct fix is to support overlapping register classes in regalloc.rs (see bytecodealliance/regalloc.rs#98). However, in #2228, a simpler stopgap fix was proposed. This fix also requires a small modification to regalloc.rs, but simply to omit call instructions' defs/mods from the clobbered-register-set computation. If the caller and callee have the same ABI, then this is safe.
If the full fix in regalloc.rs is not an easy change (it appears it will not be, as aliasing registers will interact in a fundamental way with the core allocation data structures) then IMHO we should implement the stopgap fix for now, so that when we call into library or other code, we don't have subtle correctness issues with half-clobbered vector values.
cc @julian-seward1, @bnjbvr, @akirilov-arm
cfallin labeled Issue #2254:
It arose in #2228 that our current implementation of the standard (SysV / AAPCS) calling convention on AArch64 is not quite right. In the real calling convention, the lower 64-bit halves of some vector registers are callee-save, while the upper 64-bit halves are caller-save. We currently treat them as fully callee-save. This is correct (if conservative) w.r.t. clobber-saves in prologue/epilogue code, but is incorrect w.r.t. callsites.
The most correct fix is to support overlapping register classes in regalloc.rs (see bytecodealliance/regalloc.rs#98). However, in #2228, a simpler stopgap fix was proposed. This fix also requires a small modification to regalloc.rs, but simply to omit call instructions' defs/mods from the clobbered-register-set computation. If the caller and callee have the same ABI, then this is safe.
If the full fix in regalloc.rs is not an easy change (it appears it will not be, as aliasing registers will interact in a fundamental way with the core allocation data structures) then IMHO we should implement the stopgap fix for now, so that when we call into library or other code, we don't have subtle correctness issues with half-clobbered vector values.
cc @julian-seward1, @bnjbvr, @akirilov-arm
cfallin closed Issue #2254:
It arose in #2228 that our current implementation of the standard (SysV / AAPCS) calling convention on AArch64 is not quite right. In the real calling convention, the lower 64-bit halves of some vector registers are callee-save, while the upper 64-bit halves are caller-save. We currently treat them as fully callee-save. This is correct (if conservative) w.r.t. clobber-saves in prologue/epilogue code, but is incorrect w.r.t. callsites.
The most correct fix is to support overlapping register classes in regalloc.rs (see bytecodealliance/regalloc.rs#98). However, in #2228, a simpler stopgap fix was proposed. This fix also requires a small modification to regalloc.rs, but simply to omit call instructions' defs/mods from the clobbered-register-set computation. If the caller and callee have the same ABI, then this is safe.
If the full fix in regalloc.rs is not an easy change (it appears it will not be, as aliasing registers will interact in a fundamental way with the core allocation data structures) then IMHO we should implement the stopgap fix for now, so that when we call into library or other code, we don't have subtle correctness issues with half-clobbered vector values.
cc @julian-seward1, @bnjbvr, @akirilov-arm
Last updated: Jan 24 2025 at 00:11 UTC