wasmtime / issue #3254 Use relative `call` instructions b... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #3254 Use relative `call` instructions b...

Wasmtime GitHub notifications bot (Aug 26 2021 at 22:13):

I'm not overly happy with the number of tests I have for this, but I don't know how to otherwise exercise it more. One test I'd like to add is a sort of binary search that tries to stress the logic around precise sizes and when veneers are inserted, but I couldn't figure out a good way to test the binary search, e.g. somehow read out whether a veneer was inserted or not. @cfallin if you've got ideas of how to more thoroughly test this I'd love to implement them.

Wasmtime GitHub notifications bot (Aug 26 2021 at 22:24):

cfallin commented on issue #3254:

@alexcrichton Thanks -- this looks like a bit of a monster and I will dive in a bit later, but on reading the PR description only, one thing strikes me: it should be possible to test the veneer insertion without generating tons of padding by forcing a "always use veneers" mode, no? I'm not sure if the plumbing is there to take Cranelift options in wasmtime-cranelift::obj but if not, perhaps this would be an appropriate reason to carry some options into that code (I can imagine other linker options might appear in the future).

Wasmtime GitHub notifications bot (Aug 26 2021 at 22:26):

cfallin edited a comment on issue #3254:

@alexcrichton Thanks -- this looks like a bit of a monster and I will dive in a bit later, but on reading the PR description only, one thing strikes me: it should be possible to test the veneer insertion without generating tons of padding by forcing an "always use veneers" mode, no? I'm not sure if the plumbing is there to take Cranelift options in wasmtime_cranelift::obj but if not, perhaps this would be an appropriate reason to carry some options into that code (I can imagine other linker options might appear in the future).

Wasmtime GitHub notifications bot (Aug 26 2021 at 23:17):

github-actions[bot] commented on issue #3254:

Subscribe to Label Action

cc @fitzgen, @kubkon, @peterhuene

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "fuzzing", "wasi", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

kubkon: wasi

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Aug 27 2021 at 00:03):

alexcrichton commented on issue #3254:

we could extend the calls to an 8 GB range

Do you know if this is common in normal AArch64 compilers? (e.g. gcc or llvm). Do you know if this is better than our current strategy of load-a-literal-and-call-it? (which we could presumably change to being a relative literal if we wanted)

have we ever thought about using code memory pages that are executable, but not readable, as additional security hardening?

ooh interesting! I always thought the execute permission implied the read permission, but if we could prevent reading that'd be pretty nifty! I'm not sure how feasible it would be at the Cranelift-level but we could try to shove information like that into the VMContext, but that's pretty Wasmtime-specific.

(I don't know if others have pondered this myself)

Wasmtime GitHub notifications bot (Aug 27 2021 at 00:29):

cfallin commented on issue #3254:

have we ever thought about using code memory pages that are executable, but not readable, as additional security hardening?

ooh interesting! I always thought the execute permission implied the read permission, but if we could prevent reading that'd be pretty nifty! I'm not sure how feasible it would be at the Cranelift-level but we could try to shove information like that into the VMContext, but that's pretty Wasmtime-specific.

It's definitely possible, but it would take some additional work with regard to linking: specifically we would need relocations that refer to .rodata and would need to fix those up to point to all constants in a pool at the end of the object.

There's the additional complication that on RISC-ish architectures like aarch64 there are limits on distance to literal-constant loads (LDR supports +/- 1MiB if I'm not mistaken?) which would mean we'd need constant islands between functions, and to switch between execute-only and read-only pages.

That said other platforms do W^X pretty successfully (OpenBSD by default at least) and it'd be a cool mitigation to say that we can do too :-)

Wasmtime GitHub notifications bot (Aug 27 2021 at 01:10):

akirilov-arm commented on issue #3254:

Do you know if this is common in normal AArch64 compilers? (e.g. gcc or llvm).

I honestly don't know - I just wanted to mention the possibility.

Do you know if this is better than our current strategy of load-a-literal-and-call-it?

I believe so - while the veneers avoid the worst aspect of that approach (the need to have a short jump over the literal, as we do for SIMD & FP literals right now), ADRP and ADD are very fast arithmetic instructions, while a literal load is still a load. Also, unless several call sites manage to share the literal, the code size is worse - 12 bytes (literal load, i.e. 4 bytes + 8-byte literal) vs. 2 instructions, i.e. 8 bytes.

I always thought the execute permission implied the read permission...

If I am not mistaken, not in the 64-bit Arm architecture.

There's the additional complication that on RISC-ish architectures like aarch64 there are limits on distance to literal-constant loads (LDR supports +/- 1MiB if I'm not mistaken?)...

Yes, but in the AArch64 case we could apply a similar trick to get some extra breathing room, i.e. an ADRP + plain LDR combination - it is not an accident that ADRP forms 12-bit-aligned addresses, while LDR supports unsigned 12-bit offsets :wink:. In fact, that combination is treated in a special way by Cortex-A55, for example, effectively fusing it.

Wasmtime GitHub notifications bot (Aug 27 2021 at 17:24):

alexcrichton commented on issue #3254:

Today the X86PCRelRodata4 relocation type coming out of cranelift is simply ignored, but I don't know why it is ignored. I just added a commit to ignore it which fixes tests on the old backend, but if someone else knows of a better comment to add as to why it's ignored I'd be happy to write it in.

Wasmtime GitHub notifications bot (Aug 27 2021 at 17:36):

bjorn3 commented on issue #3254:

X86PCRelRodata4 is a relocation pointing to read-only data directly after a function. It only exists so that it is possible to move this read-only data away from the function. If you don't do this, the emitted code is pre-relocated for the expected location of the trailing read-only data.

https://github.com/bytecodealliance/wasmtime/blob/e6f399419c60c0306d13e39c2f955090f7da943f/cranelift/jit/src/backend.rs#L916-L919

It by the way isn't emitted by any new style backends.

Wasmtime GitHub notifications bot (Aug 27 2021 at 18:14):

sunfishcode commented on issue #3254:

I don't know the current status of this, but at one point, the way Cranelift was embedded in SpiderMonkey, SpiderMonkey would prepend and append instructions to Cranelift's output, and we needed the ability to move the constant pool rodata around to make room.

Wasmtime GitHub notifications bot (Aug 31 2021 at 20:56):

alexcrichton commented on issue #3254:

Ok talked with @cfallin offline about merging all this into MachBuffer and after some fiddling I think it's going to work (got the test suite passing locally on arm64 at least). I'm gonna close this and I'll reopen with a cleaned up history, a rebase, and a new commit message.

Last updated: Apr 18 2025 at 08:04 UTC