Stream: git-wasmtime

Topic: wasmtime / issue #5190 Cranelift: de-duplicate bounds che...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 17:08):

fitzgen commented on issue #5190:

Trevor and I investigated the failing test on aarch64 and this ultimately led us
to the conclusion that this patch shouldn't actually land.

Fundamentally, this PR changes the semantics of heap_addr from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access *or return an address
that will trap when accessed*". And this semantic change only happens when
dynamic memories and spectre mitigations are enabled. In other configurations
heap_addr still has the old semantics.

This leads us into a bit of a mess:

I'm not sure how to resolve these issues and still deduplicate the bounds checks
(without going full value range analysis). Given that last point, it might
actually be better to change heap_addr's semantics to "or return an address
that will trap when accessed". We would need to be careful to check that all
libcalls that take an address do explicit bounds checks on that address rather
than assuming that the JIT code already did one via its heap_addr. This is a
little bit scary since it means that valid assumptions code could have been
written under no longer hold. (Although I don't think we generally make this
assumption in the code base and usually we take wasm addresses rather than the
results of heap_addrs for libcalls; need to take a census to be more sure
about this).

Interested in folks thoughts on this.


Separately, Trevor and I discovered the following issues / not necessarily
issues but maybe things we should tighten up:

cc @cfallin @elliottt @jameysharp

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 17:37):

alexcrichton commented on issue #5190:

Fundamentally, this PR changes the semantics of heap_addr from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access or return an address
that will trap when accessed".

I think it changes the semantics for dynamic heaps, but this doesn't change the semantics for static heaps I think? In that sense all our libcalls should already be guarded against "this address is in bounds and should not segfault" I think?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 17:38):

cfallin commented on issue #5190:

Thanks for digging into this; a few thoughts:


Overall, I think I'm not as pessimistic on this: it basically changes heap_addr to work in the dynamic-memory case like it does in the static-memory case, and Wasmtime is already well-tested with its page-protection-based bounds checking. As long as the address-0 page is unmapped (as it is on all of our platforms), we are basically using it as a one-page-sized guard region.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 18:33):

fitzgen commented on issue #5190:

Yeah, and I agree it is safe, it just makes me a little nervous to produce CLIF that has heap_addrs that aren't immediately followed by the associated memory access. As you allude to, I would be more comfortable having this kind of thing done in the mid-end.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 19:58):

fitzgen commented on issue #5190:

I guess the biggest unknown in my mind is how large of a NULL guard page region we can rely on? Because this PR bakes in assumptions that the spectre guard already makes about 0 + offset will always trap for any 32-bit Wasm offset immediate.

For example, given

(module
  (memory (export "memory") 1)
  (func (export "f") (param i32) (result i32)
    local.get 0
    i32.load offset=0xfffffff0))

We lower that Wasm to this clif:

v4 = heap_addr.i64 heap0, v2, 0x8000_0000
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5

And then (with static memories (but smaller than 4GiB guard region) + spectre guards) legalize that clif to this clif:

;; check if the pointer is larger than the static memory size (including both
;; wasm memory and guard pages, I think?)
v7 = uextend.i64 v2
v8 = icmp_imm ugt v7, 0x8000_0000
trapnz v8, heap_oob
v9 = iconst.i64 0x8000_0000
;; gv4 is the memory base
v10 = global_value.i64 gv4
v11 = iadd v10, v7
v12 = iconst.i64 0
v13 = icmp ugt v7, v9  ; v9 = 0x8000_0000
v4 = select_spectre_guard v13, v12, v11  ; v12 = 0
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5

Do we guarantee a 4GiB-sized NULL guard region? Because doing the spectre guard on the base (v11) rather than base plus offset (v5) makes that assumption. This doesn't seem like something we can rely on, especially across platforms?

This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards, and using this patch which removes the dynamic memory's bounds check in favor of the spectre guard's bounds check:

;; extend wasm i32 pointer to native i64 pointer
v7 = uextend.i64 v2
;; gv4 is the base of memory
v8 = global_value.i64 gv4
;; overflow check for pointer + offset
v9 = iconst.i64 0xffff_0000
v10 = uadd_overflow_trap v7, v9, heap_oob  ; v9 = 0xffff_0000
;; gv5 is the length of memory
v11 = global_value.i64 gv5
v12 = iadd v11, v7
v13 = iconst.i64 0
v14 = icmp ugt v10, v8
v4 = select_spectre_guard v14, v13, v12  ; v13 = 0
v5 = iadd_imm v4, 0xffff_fff0
v6 = load.i32 little heap v5

We're baking in that same assumption. Now we aren't just letting speculative execution access the hopefully-null-guard-page-protected region of memory in 0..4GiB, but regular Wasm execution as well.

I think we can fix all this ambiguity by effectively doing select_spectre_guard(base + offset) instead of select_spectre_guard(base) + offset?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 20:16):

fitzgen commented on issue #5190:

I think we can fix all this ambiguity by effectively doing select_spectre_guard(base + offset) instead of select_spectre_guard(base) + offset?

This requires changes to heap_addr where it will need to be given not the accumulated access size (wasm offset + size_of(type)) but those individual components, so that it can return the native address of the translated wasm address, so that we can spectre guard that rather than the native base.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 20:28):

bjorn3 commented on issue #5190:

I don't think we can guarantee a 4GB null guard. Memory may be mapped within this range, whether explicitly using mmal or because the executable has a fixed position within this range. Only 4096 bytes is guaranteed on linux as this is the default value for vm.mmap_min_addr.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 20:54):

cfallin commented on issue #5190:

If I understand the code correctly, I think we handle all of this already -- first, here we get offset_guard_size from the heap spec, and in the return-NULL-to-trap configuration we should have a guard size of 0 as with any dynamically-bounds-checked memory (or maybe it's a small nominal amount, I don't remember, but certainly not 4GiB). Then, here we include that offset on the heap_addr, and it will bounds-check appropriately taking the offset into consideration.

Or I guess all of this is a long way of saying: I don't think we actually lower into that CLIF if dynamic memory is configured. This is subtly different from

This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the translation with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 20:55):

cfallin commented on issue #5190:

(And to answer the immediate question, no, we definitely can't assume a 4GiB guard region at address 0! That would be an un-portable mess, as you observe.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 20:58):

fitzgen commented on issue #5190:

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

No, this is generated from the original input Wasm.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:03):

fitzgen commented on issue #5190:

Then, here we include that offset on the heap_addr, and it will bounds-check appropriately taking the offset into consideration.

This still doesn't prevent speculated execution from reading 0..4GiB though, because the generated code ultimately does

spectre_guard(base) + offset

rather than

spectre_guard(base + offset)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:04):

fitzgen edited a comment on issue #5190:

I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.

No, each different legalization example is generated from the original input Wasm with different flags passed to Wasmtime. I was not working with just the CLIF. Always using the original Wasm as input.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:13):

cfallin commented on issue #5190:

Ah, OK, I am misreading your concern, sorry. I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong; and was very confused because the result of the uadd_overflow_trap was fed into the icmp, so all seemed to account for the offset.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:16):

fitzgen commented on issue #5190:

I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;

Today, actual execution is not affected by this.

After this PR, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

Not that I am aware of.

Currently working on the heap_addr changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen crates, such as the interpreter, out of rustc; we'll see how deep the rabbit hole goes).

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:17):

fitzgen edited a comment on issue #5190:

I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;

Today, actual execution is not affected by this.

After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing heap_addr with separate offset and size as you've noted.

Once we do that, are there any other issues with this approach?

Not that I am aware of.

Currently working on the heap_addr changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen crates, such as the interpreter, out of rustc; we'll see how deep the rabbit hole goes).

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:20):

cfallin commented on issue #5190:

Today, actual execution is not affected by this.

After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.

Right, sorry, makes sense now; not sure why I missed that!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:22):

fitzgen commented on issue #5190:

not sure why I missed that!

No worries, this stuff is pretty subtle, it has basically taken me all day just to write these comments and read through the code enough to understand what is happening here :-p

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 16:59):

fitzgen commented on issue #5190:

@cfallin I've rebased this on top of main and the recent bounds checking related PRs I've landed and all tests are passing now and this should be good to land. Just wanted to flag you to see if you wanted to give it one last once over.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 11 2022 at 00:22):

fitzgen commented on issue #5190:

Just realized @cfallin is going to be OoO for a few days, @jameysharp would you mind taking a look?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 16:48):

fitzgen commented on issue #5190:

It's unfortunate that the extensive comments in cranelift/filetests/filetests/isa/x64/heap.clif haven't been maintained since @abrown wrote them in #4841. If you have a chance to fix them up before merging this I think that'd be great, but I wouldn't block merging on that.

I plan on fixing up a lot of heap_addr related things, I'll add this to my list.


Last updated: Nov 22 2024 at 16:03 UTC