alexcrichton commented on issue #3254:
I'm not overly happy with the number of tests I have for this, but I don't know how to otherwise exercise it more. One test I'd like to add is a sort of binary search that tries to stress the logic around precise sizes and when veneers are inserted, but I couldn't figure out a good way to test the binary search, e.g. somehow read out whether a veneer was inserted or not. @cfallin if you've got ideas of how to more thoroughly test this I'd love to implement them.
cfallin commented on issue #3254:
@alexcrichton Thanks -- this looks like a bit of a monster and I will dive in a bit later, but on reading the PR description only, one thing strikes me: it should be possible to test the veneer insertion without generating tons of padding by forcing a "always use veneers" mode, no? I'm not sure if the plumbing is there to take Cranelift options in
wasmtime-cranelift::obj
but if not, perhaps this would be an appropriate reason to carry some options into that code (I can imagine other linker options might appear in the future).
cfallin edited a comment on issue #3254:
@alexcrichton Thanks -- this looks like a bit of a monster and I will dive in a bit later, but on reading the PR description only, one thing strikes me: it should be possible to test the veneer insertion without generating tons of padding by forcing an "always use veneers" mode, no? I'm not sure if the plumbing is there to take Cranelift options in
wasmtime_cranelift::obj
but if not, perhaps this would be an appropriate reason to carry some options into that code (I can imagine other linker options might appear in the future).
github-actions[bot] commented on issue #3254:
Subscribe to Label Action
cc @fitzgen, @kubkon, @peterhuene
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "fuzzing", "wasi", "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
- kubkon: wasi
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #3254:
we could extend the calls to an 8 GB range
Do you know if this is common in normal AArch64 compilers? (e.g. gcc or llvm). Do you know if this is better than our current strategy of load-a-literal-and-call-it? (which we could presumably change to being a relative literal if we wanted)
have we ever thought about using code memory pages that are executable, but not readable, as additional security hardening?
ooh interesting! I always thought the execute permission implied the read permission, but if we could prevent reading that'd be pretty nifty! I'm not sure how feasible it would be at the Cranelift-level but we could try to shove information like that into the VMContext, but that's pretty Wasmtime-specific.
(I don't know if others have pondered this myself)
cfallin commented on issue #3254:
have we ever thought about using code memory pages that are executable, but not readable, as additional security hardening?
ooh interesting! I always thought the execute permission implied the read permission, but if we could prevent reading that'd be pretty nifty! I'm not sure how feasible it would be at the Cranelift-level but we could try to shove information like that into the VMContext, but that's pretty Wasmtime-specific.
It's definitely possible, but it would take some additional work with regard to linking: specifically we would need relocations that refer to
.rodata
and would need to fix those up to point to all constants in a pool at the end of the object.There's the additional complication that on RISC-ish architectures like aarch64 there are limits on distance to literal-constant loads (
LDR
supports +/- 1MiB if I'm not mistaken?) which would mean we'd need constant islands between functions, and to switch between execute-only and read-only pages.That said other platforms do W^X pretty successfully (OpenBSD by default at least) and it'd be a cool mitigation to say that we can do too :-)
akirilov-arm commented on issue #3254:
Do you know if this is common in normal AArch64 compilers? (e.g. gcc or llvm).
I honestly don't know - I just wanted to mention the possibility.
Do you know if this is better than our current strategy of load-a-literal-and-call-it?
I believe so - while the veneers avoid the worst aspect of that approach (the need to have a short jump over the literal, as we do for SIMD & FP literals right now),
ADRP
andADD
are very fast arithmetic instructions, while a literal load is still a load. Also, unless several call sites manage to share the literal, the code size is worse - 12 bytes (literal load, i.e. 4 bytes + 8-byte literal) vs. 2 instructions, i.e. 8 bytes.I always thought the execute permission implied the read permission...
If I am not mistaken, not in the 64-bit Arm architecture.
There's the additional complication that on RISC-ish architectures like aarch64 there are limits on distance to literal-constant loads (
LDR
supports +/- 1MiB if I'm not mistaken?)...Yes, but in the AArch64 case we could apply a similar trick to get some extra breathing room, i.e. an
ADRP
+ plainLDR
combination - it is not an accident thatADRP
forms 12-bit-aligned addresses, whileLDR
supports unsigned 12-bit offsets :wink:. In fact, that combination is treated in a special way by Cortex-A55, for example, effectively fusing it.
alexcrichton commented on issue #3254:
Today the
X86PCRelRodata4
relocation type coming out of cranelift is simply ignored, but I don't know why it is ignored. I just added a commit to ignore it which fixes tests on the old backend, but if someone else knows of a better comment to add as to why it's ignored I'd be happy to write it in.
bjorn3 commented on issue #3254:
X86PCRelRodata4
is a relocation pointing to read-only data directly after a function. It only exists so that it is possible to move this read-only data away from the function. If you don't do this, the emitted code is pre-relocated for the expected location of the trailing read-only data.It by the way isn't emitted by any new style backends.
sunfishcode commented on issue #3254:
I don't know the current status of this, but at one point, the way Cranelift was embedded in SpiderMonkey, SpiderMonkey would prepend and append instructions to Cranelift's output, and we needed the ability to move the constant pool rodata around to make room.
alexcrichton commented on issue #3254:
Ok talked with @cfallin offline about merging all this into
MachBuffer
and after some fiddling I think it's going to work (got the test suite passing locally on arm64 at least). I'm gonna close this and I'll reopen with a cleaned up history, a rebase, and a new commit message.
Last updated: Jan 24 2025 at 00:11 UTC