akirilov-arm commented on issue #17:
@cfallin Thanks for the feedback! Concerning your general questions:
* It might be good to add a little more detail here to describe proposed changes to the generated code. In particular more detail on the pointer-auth would be helpful, just to document our approach if nothing else.
Sure, I can use a simple CLIF function that just calls another subroutine and returns immediately after that as an example, and present the generated code before and after the CFI enhancements.
In the meantime, the article that is mentioned in the text actually provides that level of detail, to help with the discussion before I update the proposal.
* Do either of these features (especially pointer-auth) affect the ABI (I don't actually know)?
The scheme that I have described in the proposal should be mostly backward compatible, that is any part of the program should be able to remain oblivious of whether some code it interacts with uses PAuth and/or BTI or not. The major exceptions are unwinders, but at least the system ones should be prepared to deal with PAuth (assuming that the software environment the application runs in is relatively recent and that the suggested DWARF changes are implemented), as bytecodealliance/wasmtime#3183 has demonstrated in an unpleasant way. Of course, I can add a short paragraph that summarizes the backward compatibility aspect to the proposal.
However, it is possible to devise more sophisticated CFI schemes that require the introduction of specialized ABIs and that have a potentially higher performance overhead - for example, Apple's own variant. IMHO we should start with something simple that will allow us to work out the interactions with the rest of the system, while providing a reasonable improvement to CFI.
fitzgen commented on issue #17:
It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?
The proposed implementation will add the PACIASP instruction to the beginning of every function compiled by Cranelift and would replace the final return with the RETAA instruction.
Two questions:
Should we have support for both the nop versions of these instructions _and_ the non-nop versions? (
RETA{A,B}
are non-nop IIUC.) And should we expose whether to use the non-nop versions as a cranelift settings option?Similarly, should we support both the
a
key andb
key? Does userspace always usea
by convention? If so, then we can probably get away without the option to generateb
.These steps can be skipped for simple leaf functions that do not construct frame records on the stack.
LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?
akirilov-arm commented on issue #17:
I think @alexcrichton asked how the Rust compiler planned to use PAuth and BTI during one of the Cranelift meetings - details are in rust-lang/rust#88354; the changes seem to follow what GCC and LLVM are already doing.
As for @fitzgen's questions:
It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?
Yes, it is, but the main reason I have posted a RFC is to discuss the higher-level points such as whether the proposed hardening makes sense and what the extent of the threat model should be, e.g. we may decide that a CFI mechanism that requires ABI changes, while being beyond the scope of the current proposal, is something that we should definitely consider in the future. The discussion should also inform a potential x86 implementation using CET.
Should we have support for both the nop versions of these instructions _and_ the non-nop versions?
Keeping in mind that PAuth can't be enabled in a granular way (e.g. with IFUNC or a similar mechanism), but it has to be used globally in all functions in order to be effective, the main utility of the
NOP
versions of the instructions is to ease deployment and in particular to support the Linux distribution use case, where the OS vendors would like to ship a single set of binaries that would still be hardened on compatible hardware. I believe that this is not really relevant to Wasmtime, so the current proposal minimizes the size of the generated code, but I might be wrong, hence I added the open question - please, chime in if that's indeed the case.However, if we consider the
cg_clif
use case, then we might in fact need to support both versions of the instructions - theNOP
variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime. @bjorn3, since you are driving the former use case, I would appreciate any comments.Similarly, should we support both the
a
key andb
key? Does userspace always usea
by convention?AFAIK there isn't a convention _per se_ and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.
Also, the same interoperability considerations with the Rust compiler apply here.
LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?
I suppose you mean all leaf functions as opposed to those that are not affected by the optimization introduced by bytecodealliance/wasmtime#2960, which is what the text currently proposes (does LLVM make that distinction)? Again, that would mostly be about being able to accommodate whatever the Rust compiler decides to do.
tschneidereit commented on issue #17:
However, if we consider the
cg_clif
use case, then we might in fact need to support both versions of the instructions - theNOP
variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime.I wonder if this might also be relevant for scenarios where
wasmtime compile
is used, and the resulting.cwasm
file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the.cwasm
file to run on a broader range of hardware.
bnjbvr commented on issue #17:
Similarly, should we support both the a key and b key? Does userspace always use a by convention?
AFAIK there isn't a convention per se and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.
Some relevant documentation about the choice of using the A or B keys for signing pointers, according to LLVM's (maybe Apple's LLVM fork _in particular_) documentation of pointer authentication: https://github.com/apple/llvm-project/blob/next/clang/docs/PointerAuthentication.rst#key-assignments
In particular, I do confirm seeing B key be used for signing return addresses on Mac M1.
If I understand correctly, for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 (when creating a backtrace for an error, for instance), the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate? If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?
akirilov-arm commented on issue #17:
... according to LLVM's (maybe Apple's LLVM fork _in particular_) documentation of pointer authentication...
To be precise - this concerns
arm64e
, Apple's pointer authentication ABI variant, which is incompatible with its predecessor, while the proposal here discusses an approach that interoperates mostly transparently with vanilla AAPCS64.However, thanks for pointing that out because it means that we may need to do something different when targetting macOS. I have to admit that my previous answer (and, indeed, the proposal itself) has been written mostly with Linux in mind.
... for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 [...], the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate?
Yes, it makes sense.
If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?
Well, considering the Linux/DWARF case, the key could actually be specified in the CIE via a CIE augmentation string.
akirilov-arm commented on issue #17:
@tschneidereit
However, if we consider the
cg_clif
use case, then we might in fact need to support both versions of the instructions - theNOP
variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime.I wonder if this might also be relevant for scenarios where
wasmtime compile
is used, and the resulting.cwasm
file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the.cwasm
file to run on a broader range of hardware.While I was writing the proposal, my assumption was that even if Wasmtime compiled a Wasm module ahead-of-time, the machine on which the generated code executed would usually have the same (or better) capabilities than those enabled during compilation. How important is the opposite use case? @cfallin also expressed some reservations about the overhead of the additional instructions, so machines without PAuth support would probably rather avoid them.
akirilov-arm edited a comment on issue #17:
@tschneidereit
I wonder if this might also be relevant for scenarios where
wasmtime compile
is used, and the resulting.cwasm
file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the.cwasm
file to run on a broader range of hardware.While I was writing the proposal, my assumption was that even if Wasmtime compiled a Wasm module ahead-of-time, the machine on which the generated code executed would usually have the same (or better) capabilities than those enabled during compilation. How important is the opposite use case? @cfallin also expressed some reservations about the overhead of the additional instructions, so machines without PAuth support would probably rather avoid them.
Also, this discussion might be relevant in the context of forward-edge CFI using Intel CET, I believe.
akirilov-arm edited a comment on issue #17:
@tschneidereit
I wonder if this might also be relevant for scenarios where
wasmtime compile
is used, and the resulting.cwasm
file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the.cwasm
file to run on a broader range of hardware.While I was writing the proposal, my assumption was that even if Wasmtime compiled a Wasm module ahead-of-time, the machine on which the generated code executed would usually have the same (or better) capabilities than those enabled during compilation. How important is the opposite use case? @cfallin also expressed some reservations about the overhead of the additional instructions, so machines without PAuth support would probably rather avoid them.
Also, this discussion might be relevant in the context of forward-edge CFI using Intel CET, I believe.
P.S. What we have been talking about so far is essentially a forward compatibility issue with PAuth, but after experimenting with BTI I have realized that there is another, backward compatibility problem - imagine that a Wasm module is compiled ahead-of-time without any BTI support, so the generated code will not contain any BTI instructions, and then a machine that does support BTI tries to execute the code. The latter will mark the executable memory pages as containing BTI instructions, which will cause crashes because all indirect branches will fault due to the missing instructions. The solution to this issue for static compilers involves a couple of ELF extensions, so we could use something similar for the
.cwasm
files generated by Wasmtime. Alternatively, if we agree that we do not support the case in which the hardware capabilities in the compilation environment differ from those in the execution environment, then we would not need to do anything special.
akirilov-arm commented on issue #17:
During the last Cranelift project meeting @alexcrichton mentioned that Wasmtime would error out if it tried to load an AOT compiled module that was built with different settings than the ones used by the current environment, which means that supporting the
NOP
versions of the instructions is not that important, at least for the Wasmtime use case.
akirilov-arm commented on issue #17:
I have updated the proposal text to align it with the equivalent work on the Rust compiler; also, the changes should address the comments from the discussion above.
There is a suggestion that @abrown made elsewhere - since BTI and the
ENDBR
instruction introduced by Intel CET are quite similar, it makes sense to consider a generic setting to control their usage. Of course, the question is what would targets that don't have special hardware support for an equivalent mechanism (such as s390x, I believe) do - one option is to just return an error if the generic setting is enabled, while another is to implement a software-only protection such as the one I discuss briefly in the proposal text. Currently, I have chosen to keep the setting controlling BTI usage to be AArch64-specific.
akirilov-arm commented on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [ ] @cfallin
- [ ] @cratelyn
- [ ] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [ ] @radu-matei
Google / Envoy
- [ ] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [ ] @bjorn3
cfallin edited a comment on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [x] @cfallin
- [ ] @cratelyn
- [ ] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [ ] @radu-matei
Google / Envoy
- [ ] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [ ] @bjorn3
cfallin edited a comment on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [x] @cfallin
- [ ] @cratelyn
- [ ] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [ ] @radu-matei
Google / Envoy
- [ ] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [x] @bjorn3
cfallin edited a comment on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [x] @cfallin
- [ ] @cratelyn
- [ ] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [ ] @radu-matei
Google / Envoy
- [x] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [x] @bjorn3
akirilov-arm commented on issue #17:
Entering Final Call Period
Now that we have sign-offs from multiple separate groups, as per the process this RFC will move into a final comment period (FCP).
The FCP will end on Monday, April 18.
akirilov-arm edited a comment on issue #17:
Entering Final Comment Period
Now that we have sign-offs from multiple separate groups, as per the process this RFC will move into a final comment period (FCP).
The FCP will end on Monday, April 18.
fitzgen edited a comment on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [x] @cfallin
- [ ] @cratelyn
- [x] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [ ] @radu-matei
Google / Envoy
- [x] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [x] @bjorn3
akirilov-arm edited a comment on issue #17:
Motion to finalize with a disposition to merge
As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.
Thanks everyone for the discussion and the feedback!
Stakeholders sign-off
Arm
- [x] @akirilov-arm
- [ ] @sparker-arm
DFINITY
- [ ] @granstrom
Embark Studios
- [ ] @bnjbvr
- [ ] @repi
Fastly
- [ ] @acfoltzer
- [ ] @alexcrichton
- [ ] @aturon
- [x] @cfallin
- [ ] @cratelyn
- [x] @fitzgen
- [ ] @iximeow
- [ ] @pchickey
- [ ] @peterhuene
- [ ] @sunfishcode
- [ ] @tschneidereit
Fermyon
- [ ] @bacongobbler
- [x] @radu-matei
Google / Envoy
- [x] @PiotrSikora
IBM
- [ ] @uweigand
Intel
- [ ] @abrown
- [ ] @jlb6740
- [ ] @mingqiusun
Microsoft
- [ ] @squillace
Mozilla
- [ ] @julian-seward1
- [ ] @yurydelendik
wasmCloud
- [ ] @autodidaddict
Unaffiliated
- [x] @bjorn3
akirilov-arm commented on issue #17:
The FCP has ended without any objections raised and without further discussion, so it is time to merge this RFC proposal. Thanks everyone for the discussion!
Last updated: Jan 24 2025 at 00:11 UTC