bjorn3 commented on issue #3360:
An important caveat is that GCC arranges to save and restore only those registers known to the specific architecture variant being compiled for. This can make __builtin_setjmp and __builtin_longjmp more efficient than their library counterparts in some cases, but it can also cause incorrect and mysterious behavior when mixing with code that uses the full register set.
This has me worried. What are the exact conditions in which this causes incorrect behavior?
sunfishcode commented on issue #3360:
This has me worried. What are the exact conditions in which this causes incorrect behavior?
Suppose one of the hardware architectures we support added a new register, and decides to have a convention that it's compiler-managed, and callee-saved. The host compiler doesn't use the new register, because it's targeting the generic architecture family, but Cranelift does use the new register, because cranelift-native.
Then we could get into a situation like this:
// compiled by cranelift void outer(void) { write_the_new_register(42); catcher(); assert(read_the_new_register() == 42); } // compiled by the host compiler void catcher(void) { if (__builtin_setjmp(buf) == 0) { inner(); } } // compiled by cranelift void inner(void) { auto save = read_the_new_register(); clobber_the_new_register(); if (some condition) { jumper(); } write_the_new_register(save); } // compiled by the host compiler void jumper(void) { __builtin_longjmp(buf, 1); }
In practice, this should be rare, because the major architectures don't add new registers very often, and when they do, it's usually not compiler-managed, and when it is, it's usually call-clobbered. New compiler-managed callee-saved state also breaks many existing
libc
-implementedsetjmp
functions in the same way, so it's not often done.
froydnj commented on issue #3360:
An important caveat is that GCC arranges to save and restore only those registers known to the specific architecture variant being compiled for. This can make __builtin_setjmp and __builtin_longjmp more efficient than their library counterparts in some cases, but it can also cause incorrect and mysterious behavior when mixing with code that uses the full register set.
This has me worried. What are the exact conditions in which this causes incorrect behavior?
It doesn't require mixing and matching with the full register set. You run into problems like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997 for instance (clang has the same problem).
This particular case is not a problem for wasmtime, because all access to
__builtin_{set,long}jmp
should be within wasmtime itself (and you shouldn't be able to mix-and-match wasmtime runtimes...)
bjorn3 commented on issue #3360:
In practice, this should be rare, because the major architectures don't add new registers very often
ymm/zmm for avx/avx512 are by default unknown to the compiled wasmtime as only SSE2 is enabled by default for most x86_64 targets.
and when it is, it's usually call-clobbered.
I see. The x86_64 SystemV does indeed list xmm, ymm and zmm as being clobbered.
sunfishcode commented on issue #3360:
Thanks for that link! Yes, in our case, we know that
__builtin_setjmp
and__builtin_longjmp
are compiled by the same compiler with the same flags—they're in the same source file, so we're safe from that particular hazard at least.We do have a hazard though, in the case of a mismatch between Cranelift's and the host compiler's callee-saved registers. I've added a comment about this to the PR; I'm open to ideas for what else we can do here.
sunfishcode commented on issue #3360:
For completeness, a form of this hazard is already present in Wasmtime; if someone statically links Wasmtime, the statically-linked-in
sigsetjmp
will hard-code the registers to save, and if Cranelift at runtime detects and uses more registers and relies on them being callee-saved, it'll also trigger UB.
alexcrichton commented on issue #3360:
We may need to disable this for AArch64 with
clang
, locally when I try to compile it spits out:foo.c:34:7: error: __builtin_setjmp is not supported for the current target if (platform_setjmp(buf) != 0) { ^~~~~~~~~~~~~~~~~~~~ foo.c:23:30: note: expanded from macro 'platform_setjmp' #define platform_setjmp(buf) __builtin_setjmp(buf) ^~~~~~~~~~~~~~~~~~~~~ foo.c:44:3: error: __builtin_longjmp is not supported for the current target platform_longjmp(*buf, 1); ^~~~~~~~~~~~~~~~~~~~~~~~~ foo.c:24:36: note: expanded from macro 'platform_longjmp' #define platform_longjmp(buf, arg) __builtin_longjmp(buf, arg) ^~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 errors generated.
Poking around in clang it looks like this is gated on
hasSjLjLowering
which is true for x86 and arm but not for AArch64
sunfishcode commented on issue #3360:
I fixed the code to compile on clang on aarch64, where
__builtin_setmp
isn't supported. If there are other architectures/compilers that don't support it, we can add them as needed.Talking about this with @cfallin offline, adding new callee-saved registers to cranelift amounts to adding a new ABI, and there are lots of considerations, including calls to and from host code, the register save/restore code in wasmtime-fiber, the
__builtin_setjmp
issue here, trampolines, and possibly other things, so we'll need to carefully think through the design at that time. I've added a comment to the code added in this PR, which seems to be the main useful thing we can do for now.
Last updated: Nov 22 2024 at 16:03 UTC