wasmtime / issue #5190 Cranelift: de-duplicate bounds che... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5190 Cranelift: de-duplicate bounds che...

Wasmtime GitHub notifications bot (Nov 04 2022 at 17:08):

Trevor and I investigated the failing test on aarch64 and this ultimately led us
to the conclusion that this patch shouldn't actually land.

Fundamentally, this PR changes the semantics of heap_addr from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access *or return an address
that will trap when accessed*". And this semantic change only happens when
dynamic memories and spectre mitigations are enabled. In other configurations
heap_addr still has the old semantics.

This leads us into a bit of a mess:

If the result of an out of bounds heap_addr is never loaded from or stored
to, then the trap never happens. Not even "eventually". This very much does
not match heap_addr's documented semantics.

Piggybacking off the spectre mitigation, this PR use 0 as the address that
will trap when accessed. However, I'm not sure that this will always work in
the face of large offsets? There is a bit of code emitted from
cranelift-wasm that does a heap_addr to get a base address and then adds
offsets to that base. Need more investigation here.

Also according to heap_addr's documented semantics today, it would be valid
to mark a load as notrap when given the result of a heap_addr because if
we got to the load then heap_addr already verified that the address is in
bounds for us:

v0 = heap_addr ... v1 = load.i32 little heap notrap v0

The notrap flag would become incorrect under these semantic changes.

That said, I think heap_addr does have roughly the "or return an address
that will trap when accessed" semantics when we are using static memories? So
maybe we actually can make this semantic change to heap_addr and then this
would actually make it more consistent across dynamic and static memories? And
then the notrap flag would be invalid here.

I'm not sure how to resolve these issues and still deduplicate the bounds checks
(without going full value range analysis). Given that last point, it might
actually be better to change heap_addr's semantics to "or return an address
that will trap when accessed". We would need to be careful to check that all
libcalls that take an address do explicit bounds checks on that address rather
than assuming that the JIT code already did one via its heap_addr. This is a
little bit scary since it means that valid assumptions code could have been
written under no longer hold. (Although I don't think we generally make this
assumption in the code base and usually we take wasm addresses rather than the
results of heap_addrs for libcalls; need to take a census to be more sure
about this).

Interested in folks thoughts on this.

Separately, Trevor and I discovered the following issues / not necessarily
issues but maybe things we should tighten up:

validate_atomic_addr in libcalls.rs only validates a single byte address,
not a full size of the access. This means that a bounds check could "succeed"
when the first byte being accessed is the last byte of memory and then the
other bytes of the access are out of bounds. In practice we don't have this
bug, because all callers call validate_atomic_addr with base + size_of_access but this is a bit of a foot gun waiting to bite us. Would be
nicer to pass the base and size_of_access into validate_atomic_addr,
similar to the heap_addr instruction.

The heap access spectre mitigation--at least for dynamic memories, I haven't
looked at static memories yet--only checks the first byte of the access, not
the last byte. This means that speculative execution could do an i64 load
beginning at the last byte of memory and bring (the cache line for) seven out
of bounds bytes into cache.

The heap_addr instruction should have .can_trap(true) in the meta crate,
according to its documented semantics. It doesn't currently have
this. However, this would also prevent heap_addrs from being GVN'd or
LICM'd, which we definitely want to happen, we just want to make sure that any
load of a heap_addr'd address is still heap_addr'd. Open to suggestions
here.

Although if we do change heap_addr's semantics to "or returns an address
that will trap when accessed" we don't need to mark it can_trap(true) and
this problem goes away.

cc @cfallin @elliottt @jameysharp

Wasmtime GitHub notifications bot (Nov 04 2022 at 17:37):

alexcrichton commented on issue #5190:

Fundamentally, this PR changes the semantics of heap_addr from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access or return an address
that will trap when accessed".

I think it changes the semantics for dynamic heaps, but this doesn't change the semantics for static heaps I think? In that sense all our libcalls should already be guarded against "this address is in bounds and should not segfault" I think?

Wasmtime GitHub notifications bot (Nov 04 2022 at 17:38):

cfallin commented on issue #5190:

Thanks for digging into this; a few thoughts:

I think that, as you suggest, we are already relying on the trap happening when the address is accessed, in the static-memory case. It makes the most sense to me to update the documentation for heap_addr in that light.

I was somewhat concerned about this point:

There is a bit of code emitted from
cranelift-wasm that does a heap_addr to get a base address and then adds
offsets to that base. Need more investigation here.

and looked into it a bit; are you referring to the code in prepare_addr? If so I think that should generally be safe: its optimization (using heap_addr with the same offset of 1 for every load with an actual offset potentially larger) is triggered only when it knows that the heap is using a large-enough guard region.

Arguably such an optimization should be a rewrite rule in Cranelift instead -- translate all heap_addrs on a heap-with-guard-region to use offset=0 so they can be GVN'd together -- but I don't think this will cause a correctness issue in the short term, unless I'm misunderstanding or missing some other code...

Re: this

We would need to be careful to check that all
libcalls that take an address do explicit bounds checks on that address rather
than assuming that the JIT code already did one via its heap_addr. This is a
little bit scary since it means that valid assumptions code could have been
written under no longer hold.

if the address will trap when executed, then we're fine safety-wise (this won't be a backdoor around the heap sandbox) but I agree it will cause problems in other ways -- specifically it will result in a SIGSEGV in the runtime rather than in guest code, which will (usually) kill the process. So I agree we should verify that libcalls take Wasm addresses.

I think this is an issue today though, as (as you noted) the guard-region case already relies on "actual load/store traps, not heap_addr". (That increases the importance of doing the above verification, to be clear.)

Overall, I think I'm not as pessimistic on this: it basically changes heap_addr to work in the dynamic-memory case like it does in the static-memory case, and Wasmtime is already well-tested with its page-protection-based bounds checking. As long as the address-0 page is unmapped (as it is on all of our platforms), we are basically using it as a one-page-sized guard region.

Wasmtime GitHub notifications bot (Nov 04 2022 at 18:33):

fitzgen commented on issue #5190:

are you referring to the code in prepare_addr? If so I think that should generally be safe: its optimization (using heap_addr with the same offset of 1 for every load with an actual offset potentially larger) is triggered only when it knows that the heap is using a large-enough guard region.

Yeah, and I agree it is safe, it just makes me a little nervous to produce CLIF that has heap_addrs that aren't immediately followed by the associated memory access. As you allude to, I would be more comfortable having this kind of thing done in the mid-end.

Wasmtime GitHub notifications bot (Nov 04 2022 at 19:58):

fitzgen commented on issue #5190:

I guess the biggest unknown in my mind is how large of a NULL guard page region we can rely on? Because this PR bakes in assumptions that the spectre guard already makes about 0 + offset will always trap for any 32-bit Wasm offset immediate.

For example, given
(module
  (memory (export "memory") 1)
  (func (export "f") (param i32) (result i32)
    local.get 0
    i32.load offset=0xfffffff0))
We lower that Wasm to this clif:
v4 = heap_addr.i64 heap0, v2, 0x8000_0000
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5
And then (with static memories (but smaller than 4GiB guard region) + spectre guards) legalize that clif to this clif:
;; check if the pointer is larger than the static memory size (including both
;; wasm memory and guard pages, I think?)
v7 = uextend.i64 v2
v8 = icmp_imm ugt v7, 0x8000_0000
trapnz v8, heap_oob
v9 = iconst.i64 0x8000_0000
;; gv4 is the memory base
v10 = global_value.i64 gv4
v11 = iadd v10, v7
v12 = iconst.i64 0
v13 = icmp ugt v7, v9  ; v9 = 0x8000_0000
v4 = select_spectre_guard v13, v12, v11  ; v12 = 0
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5
Do we guarantee a 4GiB-sized NULL guard region? Because doing the spectre guard on the base (v11) rather than base plus offset (v5) makes that assumption. This doesn't seem like something we can rely on, especially across platforms?

This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards, and using this patch which removes the dynamic memory's bounds check in favor of the spectre guard's bounds check:
;; extend wasm i32 pointer to native i64 pointer
v7 = uextend.i64 v2
;; gv4 is the base of memory
v8 = global_value.i64 gv4
;; overflow check for pointer + offset
v9 = iconst.i64 0xffff_0000
v10 = uadd_overflow_trap v7, v9, heap_oob  ; v9 = 0xffff_0000
;; gv5 is the length of memory
v11 = global_value.i64 gv5
v12 = iadd v11, v7
v13 = iconst.i64 0
v14 = icmp ugt v10, v8
v4 = select_spectre_guard v14, v13, v12  ; v13 = 0
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5
We're baking in that same assumption. Now we aren't just letting speculative execution access the hopefully-null-guard-page-protected region of memory in 0..4GiB, but regular Wasm execution as well.

I think we can fix all this ambiguity by effectively doing select_spectre_guard(base + offset) instead of select_spectre_guard(base) + offset?

Wasmtime GitHub notifications bot (Nov 04 2022 at 20:16):

fitzgen commented on issue #5190:

I think we can fix all this ambiguity by effectively doing select_spectre_guard(base + offset) instead of select_spectre_guard(base) + offset?

This requires changes to heap_addr where it will need to be given not the accumulated access size (wasm offset + size_of(type)) but those individual components, so that it can return the native address of the translated wasm address, so that we can spectre guard that rather than the native base.

Wasmtime GitHub notifications bot (Nov 04 2022 at 20:28):

bjorn3 commented on issue #5190:

I don't think we can guarantee a 4GB null guard. Memory may be mapped within this range, whether explicitly using mmal or because the executable has a fixed position within this range. Only 4096 bytes is guaranteed on linux as this is the default value for vm.mmap_min_addr.

Wasmtime GitHub notifications bot (Nov 04 2022 at 20:54):

cfallin commented on issue #5190:

If I understand the code correctly, I think we handle all of this already -- first, here we get offset_guard_size from the heap spec, and in the return-NULL-to-trap configuration we should have a guard size of 0 as with any dynamically-bounds-checked memory (or maybe it's a small nominal amount, I don't remember, but certainly not 4GiB). Then, here we include that offset on the heap_addr, and it will bounds-check appropriately taking the offset into consideration.

Or I guess all of this is a long way of saying: I don't think we actually lower into that CLIF if dynamic memory is configured. This is subtly different from

This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the translation with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

Wasmtime GitHub notifications bot (Nov 04 2022 at 20:55):

cfallin commented on issue #5190:

(And to answer the immediate question, no, we definitely can't assume a 4GiB guard region at address 0! That would be an un-portable mess, as you observe.)

Wasmtime GitHub notifications bot (Nov 04 2022 at 20:58):

fitzgen commented on issue #5190:

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

No, this is generated from the original input Wasm.

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:03):

fitzgen commented on issue #5190:

Then, here we include that offset on the heap_addr, and it will bounds-check appropriately taking the offset into consideration.

This still doesn't prevent speculated execution from reading 0..4GiB though, because the generated code ultimately does
spectre_guard(base) + offset
rather than
spectre_guard(base + offset)

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:04):

fitzgen edited a comment on issue #5190:

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

No, each different legalization example is generated from the original input Wasm with different flags passed to Wasmtime. I was not working with just the CLIF. Always using the original Wasm as input.

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:13):

cfallin commented on issue #5190:

Ah, OK, I am misreading your concern, sorry. I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong; and was very confused because the result of the uadd_overflow_trap was fed into the icmp, so all seemed to account for the offset.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:16):

fitzgen commented on issue #5190:

I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;

Today, actual execution is not affected by this.

After this PR, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

Not that I am aware of.

Currently working on the heap_addr changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen crates, such as the interpreter, out of rustc; we'll see how deep the rabbit hole goes).

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:17):

fitzgen edited a comment on issue #5190:

I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;

Today, actual execution is not affected by this.

After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

Not that I am aware of.

Currently working on the heap_addr changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen crates, such as the interpreter, out of rustc; we'll see how deep the rabbit hole goes).

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:20):

cfallin commented on issue #5190:

Today, actual execution is not affected by this.

After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

Right, sorry, makes sense now; not sure why I missed that!

Wasmtime GitHub notifications bot (Nov 04 2022 at 21:22):

fitzgen commented on issue #5190:

not sure why I missed that!

No worries, this stuff is pretty subtle, it has basically taken me all day just to write these comments and read through the code enough to understand what is happening here :-p

Wasmtime GitHub notifications bot (Nov 10 2022 at 16:59):

fitzgen commented on issue #5190:

@cfallin I've rebased this on top of main and the recent bounds checking related PRs I've landed and all tests are passing now and this should be good to land. Just wanted to flag you to see if you wanted to give it one last once over.

Wasmtime GitHub notifications bot (Nov 11 2022 at 00:22):

fitzgen commented on issue #5190:

Just realized @cfallin is going to be OoO for a few days, @jameysharp would you mind taking a look?

Wasmtime GitHub notifications bot (Nov 15 2022 at 16:48):

fitzgen commented on issue #5190:

It's unfortunate that the extensive comments in cranelift/filetests/filetests/isa/x64/heap.clif haven't been maintained since @abrown wrote them in #4841. If you have a chance to fix them up before merging this I think that'd be great, but I wouldn't block merging on that.

I plan on fixing up a lot of heap_addr related things, I'll add this to my list.

Last updated: Apr 17 2025 at 13:10 UTC