github-actions[bot] commented on issue #6810:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
afonso360 commented on issue #6810:
We are going to have some issues with our test runner. We currently use
cranelift-native
to do feature detection before running a test invocation, so that we don't natively try to run a test with an extension that the host does not support. We currently only ever enablesign_return_address
on MacOS, otherwise that feature is never enabled bycranelift-native
, which means that our test runner will always skip these tests even if QEMU signals support for them.I'm also not sure we can enable them in cranelift native, since I think that would automatically enable return address signing for anyone who just uses all native features, and that might not be desirable.
afonso360 edited a comment on issue #6810:
We are going to have some issues with our test runner. We currently use
cranelift-native
to do feature detection before running a test invocation, so that we don't natively try to run a test with an extension that the host does not support.We currently only ever enable
sign_return_address
on MacOS, that feature is otherwise never enabled bycranelift-native
, which means that our test runner will always skip these tests even if QEMU signals support for them.I'm also not sure we can enable them in cranelift native (i.e. alongside
paca
), since I think that would automatically enable return address signing for anyone who just uses all native features, and that might not be desirable.
alexcrichton commented on issue #6810:
Heh I was just digging into this a bit more and reached the same conclusion! Apparently with QEMU 8.0.2 (what I happen to have installed locally) it will, by default, enable the necessary bits for authentication so the
retaa
instruction, for example, works with QEMU 8.0.2 out-of-the box. The same binary fails on one piece of native hardware I can run on which doesn't have support (presumably I suppose).I can confirm though that if I force
sign_return_address
then the runtests as-is will crash. With this PR, however, they pass. In that sense I don't think a new runtest is necessarily needed since the wasm suite and runtests should cover everything already.Otherwise though I don't feel like I know enough about aarch64 pointer authentication bits to know what to do here. Given that pointer authentication, as we're using it, is purely a function-local decision I don't know why we would ever want to disable
sign_return_address
. I'm also not sure why it matters which key (A or B) is used.I'm also not sure we can enable them in cranelift native (i.e. alongside paca), since I think that would automatically enable return address signing for anyone who just uses all native features, and that might not be desirable.
Can you expand on this @afonso360? Do you know why this might be bad and/or not desirable?
cfallin commented on issue #6810:
Given that pointer authentication, as we're using it, is purely a function-local decision I don't know why we would ever want to disable sign_return_address. I'm also not sure why it matters which key (A or B) is used.
I guess it matters for unwinding? This is why e.g. we're forced to use the feature, and the B key specifically, on macOS, at least if I understand correctly. I'd be very curious to know if aarch64 Linux lets us do signing locally though; that would be a nice defense-in-depth thing...
afonso360 commented on issue #6810:
I had the idea that there were some issues back when this was originally implemented, but it looks like its unwinding related as @cfallin mentioned.
Here's the original discussion: https://github.com/bytecodealliance/wasmtime/pull/3606#discussion_r936966019
afonso360 edited a comment on issue #6810:
I got the impressions that there were some issues back when this was originally implemented, but it looks like its unwinding related as @cfallin mentioned.
Here's the original discussion: https://github.com/bytecodealliance/wasmtime/pull/3606#discussion_r936966019
afonso360 edited a comment on issue #6810:
Can you expand on this @afonso360? Do you know why this might be bad and/or not desirable?
I got the impressions that there were some issues back when this was originally implemented, but it looks like its unwinding related as @cfallin mentioned.
Here's the original discussion: https://github.com/bytecodealliance/wasmtime/pull/3606#discussion_r936966019
afonso360 edited a comment on issue #6810:
Can you expand on this @afonso360? Do you know why this might be bad and/or not desirable?
I got the impression that there were some issues back when this was originally implemented, but it looks like its unwinding related as @cfallin mentioned.
Here's the original discussion: https://github.com/bytecodealliance/wasmtime/pull/3606#discussion_r936966019
afonso360 edited a comment on issue #6810:
Can you expand on this @afonso360? Do you know why this might be bad and/or not desirable?
I got the impression that there were some issues back when this was originally implemented, but it looks like its unwinding related as @cfallin mentioned (They might no longer be relevant, that was just the impression I got).
Here's the original discussion: https://github.com/bytecodealliance/wasmtime/pull/3606#discussion_r936966019
alexcrichton commented on issue #6810:
Ah ok yeah that all makes sense, and this is sort of coming back to me. So I think that functionally we could enable
sign_return_address
unconditionally for normal execution and wasm backtraces should work due to the use of our own custom frame-pointer unwinder which bakes inxpaclri
to handle encrypted return addresses. The remaining issues I think are (a) what doesstd::backtrace::Backtrace::new()
do when there's wasm on the stack and (b) is it possible to get a backtrace through wasm in a debugger. In theory both of these are functions of dwarf unwinding information we emit, which in theory is correct, but that may require newer debuggers/systems to handle that libgcc bug mentioned.Testing this locally looks like the B key can't be used on Linux with Wasmtime since we haven't implemented the dwarf stuff to indicate that. If I force-enable
sign_return_address
thenstd::backtrace::Backtrace::new()
segfaults with wasm on the stack. Additionally in gdb the stack trace (of that segfault) stops at the wasm on the stack. The segfault is inlibgcc_s.so
which may mean I'm running into this. I think I've then confirmed that where if I run in a recent container (ubuntu:23.10
) then the backtrace looks correct. Running in a slightly older contains (ubuntu:22.10
) I hit the same segfault.So that's a long way of saying:
- Ignoring native backtraces, it seems like we can safely enable
sign_return_address
at all times. This will then enable automatic testing in CI since QEMU implements bothpauth
and the non-hint
semantic versions of the instructions- We must use the A key since encoding the usage of the B key in unwind information is not yet supported.
- Only recent builds (of libunwind?) support the dwarf we're emitting meaning we have the options of:
- Emit no dwarf unwind information, meaning native backtraces terminate at wasm frames (safely? unsure)
- Emit current dwarf meaning native backtraces crash unless they have the gcc patch from #3183
- Some other blending like emit native unwind info by default except on AArch64 Linux for a few years.
We don't use docker for tests in CI, just for release builds, so my guess is that the patched toolchain is so new that it's not available on GitHub actions by default (they're using ubuntu 22.04). This means that
sign_return_address
, if enabled, I believe would fail on GitHub actions if we acquire a native backtrace in tests (which we may not do, I forget). Given all this I'm tempted to say we shouldn't do anything else here at this time. In a few years we should be able to flip thesign_return_address
config option by default and have everything "just work" though.
alexcrichton commented on issue #6810:
The rabbit hole continues to go deep on this one.
I've attempted to follow-through on suggestions from today's Cranelift meeting by enabling
sign_return_address
for a single test case (or a few) throughout our runtest test suite. Turns outtail-call-conv.clif
already does this! Turns out that not only passes in CI but additionally passes locally on my macbook. This appears to be due to:
- Locally on macOS this passes because it's signing with the A key not the B key (e.g.
sign_return_address_with_bkey
isn't specified). Apparently macOS is configured such that signing with the A key is a noop, so it's as-if these instructions don't exist. That means that all the tests pass if the pointer authentication is mixed up by accident.- On CI this is passing because cranelift-filetest is skipping the test that wants
sign_return_address
. Linux hosts (as @afonso360 pointed out) never infer this flag, and then the default logic is to reject the test if a feature is specified that the host doesn't support.On
main
if I specifysign_return_address_with_bkey
then tests segfault as-is on macOS. That means that the test suite already has coverage of the necessary bits that's being fixed here. To help catch this in the future on CI I've added special logic to cranelift-filetest to specifically ignore thesign_return_address
feature. This way we should now start runningsign_return_address
code, on Linux, in CI. This I can confirm segfaults qemu on main because QEMU supportshas_pauth
.All-in-all I believe that this should now be run on CI.
Last updated: Dec 23 2024 at 12:05 UTC