fitzgen commented on issue #5190:
Trevor and I investigated the failing test on aarch64 and this ultimately led us
to the conclusion that this patch shouldn't actually land.Fundamentally, this PR changes the semantics of
heap_addr
from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access *or return an address
that will trap when accessed*". And this semantic change only happens when
dynamic memories and spectre mitigations are enabled. In other configurations
heap_addr
still has the old semantics.This leads us into a bit of a mess:
If the result of an out of bounds
heap_addr
is never loaded from or stored
to, then the trap never happens. Not even "eventually". This very much does
not matchheap_addr
's documented semantics.Piggybacking off the spectre mitigation, this PR use
0
as the address that
will trap when accessed. However, I'm not sure that this will always work in
the face of large offsets? There is a bit of code emitted from
cranelift-wasm
that does aheap_addr
to get a base address and then adds
offsets to that base. Need more investigation here.Also according to
heap_addr
's documented semantics today, it would be valid
to mark a load asnotrap
when given the result of aheap_addr
because if
we got to the load thenheap_addr
already verified that the address is in
bounds for us:
v0 = heap_addr ... v1 = load.i32 little heap notrap v0
The
notrap
flag would become incorrect under these semantic changes.That said, I think
heap_addr
does have roughly the "or return an address
that will trap when accessed" semantics when we are using static memories? So
maybe we actually can make this semantic change toheap_addr
and then this
would actually make it more consistent across dynamic and static memories? And
then thenotrap
flag would be invalid here.I'm not sure how to resolve these issues and still deduplicate the bounds checks
(without going full value range analysis). Given that last point, it might
actually be better to changeheap_addr
's semantics to "or return an address
that will trap when accessed". We would need to be careful to check that all
libcalls that take an address do explicit bounds checks on that address rather
than assuming that the JIT code already did one via itsheap_addr
. This is a
little bit scary since it means that valid assumptions code could have been
written under no longer hold. (Although I don't think we generally make this
assumption in the code base and usually we take wasm addresses rather than the
results ofheap_addr
s for libcalls; need to take a census to be more sure
about this).Interested in folks thoughts on this.
Separately, Trevor and I discovered the following issues / not necessarily
issues but maybe things we should tighten up:
validate_atomic_addr
inlibcalls.rs
only validates a single byte address,
not a full size of the access. This means that a bounds check could "succeed"
when the first byte being accessed is the last byte of memory and then the
other bytes of the access are out of bounds. In practice we don't have this
bug, because all callers callvalidate_atomic_addr
withbase + size_of_access
but this is a bit of a foot gun waiting to bite us. Would be
nicer to pass thebase
andsize_of_access
intovalidate_atomic_addr
,
similar to theheap_addr
instruction.The heap access spectre mitigation--at least for dynamic memories, I haven't
looked at static memories yet--only checks the first byte of the access, not
the last byte. This means that speculative execution could do ani64
load
beginning at the last byte of memory and bring (the cache line for) seven out
of bounds bytes into cache.The
heap_addr
instruction should have.can_trap(true)
in the meta crate,
according to its documented semantics. It doesn't currently have
this. However, this would also preventheap_addr
s from being GVN'd or
LICM'd, which we definitely want to happen, we just want to make sure that any
load of aheap_addr
'd address is stillheap_addr
'd. Open to suggestions
here.Although if we do change
heap_addr
's semantics to "or returns an address
that will trap when accessed" we don't need to mark itcan_trap(true)
and
this problem goes away.cc @cfallin @elliottt @jameysharp
alexcrichton commented on issue #5190:
Fundamentally, this PR changes the semantics of heap_addr from "calculate the
base address of the given memory access or trap if it is out of bounds" to
"calculate the base address of the given memory access or return an address
that will trap when accessed".I think it changes the semantics for dynamic heaps, but this doesn't change the semantics for static heaps I think? In that sense all our libcalls should already be guarded against "this address is in bounds and should not segfault" I think?
cfallin commented on issue #5190:
Thanks for digging into this; a few thoughts:
I think that, as you suggest, we are already relying on the trap happening when the address is accessed, in the static-memory case. It makes the most sense to me to update the documentation for
heap_addr
in that light.I was somewhat concerned about this point:
There is a bit of code emitted from
cranelift-wasm that does a heap_addr to get a base address and then adds
offsets to that base. Need more investigation here.and looked into it a bit; are you referring to the code in
prepare_addr
? If so I think that should generally be safe: its optimization (usingheap_addr
with the same offset of1
for every load with an actual offset potentially larger) is triggered only when it knows that the heap is using a large-enough guard region.Arguably such an optimization should be a rewrite rule in Cranelift instead -- translate all
heap_addr
s on a heap-with-guard-region to useoffset=0
so they can be GVN'd together -- but I don't think this will cause a correctness issue in the short term, unless I'm misunderstanding or missing some other code...Re: this
We would need to be careful to check that all
libcalls that take an address do explicit bounds checks on that address rather
than assuming that the JIT code already did one via its heap_addr. This is a
little bit scary since it means that valid assumptions code could have been
written under no longer hold.if the address will trap when executed, then we're fine safety-wise (this won't be a backdoor around the heap sandbox) but I agree it will cause problems in other ways -- specifically it will result in a SIGSEGV in the runtime rather than in guest code, which will (usually) kill the process. So I agree we should verify that libcalls take Wasm addresses.
I think this is an issue today though, as (as you noted) the guard-region case already relies on "actual load/store traps, not
heap_addr
". (That increases the importance of doing the above verification, to be clear.)
Overall, I think I'm not as pessimistic on this: it basically changes
heap_addr
to work in the dynamic-memory case like it does in the static-memory case, and Wasmtime is already well-tested with its page-protection-based bounds checking. As long as the address-0 page is unmapped (as it is on all of our platforms), we are basically using it as a one-page-sized guard region.
fitzgen commented on issue #5190:
- are you referring to the code in
prepare_addr
? If so I think that should generally be safe: its optimization (usingheap_addr
with the same offset of1
for every load with an actual offset potentially larger) is triggered only when it knows that the heap is using a large-enough guard region.Yeah, and I agree it is safe, it just makes me a little nervous to produce CLIF that has
heap_addr
s that aren't immediately followed by the associated memory access. As you allude to, I would be more comfortable having this kind of thing done in the mid-end.
fitzgen commented on issue #5190:
I guess the biggest unknown in my mind is how large of a NULL guard page region we can rely on? Because this PR bakes in assumptions that the spectre guard already makes about
0 + offset
will always trap for any 32-bit Wasm offset immediate.For example, given
(module (memory (export "memory") 1) (func (export "f") (param i32) (result i32) local.get 0 i32.load offset=0xfffffff0))
We lower that Wasm to this clif:
v4 = heap_addr.i64 heap0, v2, 0x8000_0000 v5 = iadd_imm v4, 0xffff_fff0 v6 = load.i32 little heap v5
And then (with static memories (but smaller than 4GiB guard region) + spectre guards) legalize that clif to this clif:
;; check if the pointer is larger than the static memory size (including both ;; wasm memory and guard pages, I think?) v7 = uextend.i64 v2 v8 = icmp_imm ugt v7, 0x8000_0000 trapnz v8, heap_oob v9 = iconst.i64 0x8000_0000 ;; gv4 is the memory base v10 = global_value.i64 gv4 v11 = iadd v10, v7 v12 = iconst.i64 0 v13 = icmp ugt v7, v9 ; v9 = 0x8000_0000 v4 = select_spectre_guard v13, v12, v11 ; v12 = 0 v5 = iadd_imm v4, 0xffff_fff0 v6 = load.i32 little heap v5
Do we guarantee a 4GiB-sized NULL guard region? Because doing the spectre guard on the base (
v11
) rather than base plus offset (v5
) makes that assumption. This doesn't seem like something we can rely on, especially across platforms?This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards, and using this patch which removes the dynamic memory's bounds check in favor of the spectre guard's bounds check:
;; extend wasm i32 pointer to native i64 pointer v7 = uextend.i64 v2 ;; gv4 is the base of memory v8 = global_value.i64 gv4 ;; overflow check for pointer + offset v9 = iconst.i64 0xffff_0000 v10 = uadd_overflow_trap v7, v9, heap_oob ; v9 = 0xffff_0000 ;; gv5 is the length of memory v11 = global_value.i64 gv5 v12 = iadd v11, v7 v13 = iconst.i64 0 v14 = icmp ugt v10, v8 v4 = select_spectre_guard v14, v13, v12 ; v13 = 0 v5 = iadd_imm v4, 0xffff_fff0 v6 = load.i32 little heap v5
We're baking in that same assumption. Now we aren't just letting speculative execution access the hopefully-null-guard-page-protected region of memory in 0..4GiB, but regular Wasm execution as well.
I think we can fix all this ambiguity by effectively doing
select_spectre_guard(base + offset)
instead ofselect_spectre_guard(base) + offset
?
fitzgen commented on issue #5190:
I think we can fix all this ambiguity by effectively doing
select_spectre_guard(base + offset)
instead ofselect_spectre_guard(base) + offset
?This requires changes to
heap_addr
where it will need to be given not the accumulated access size (wasm offset +size_of(type)
) but those individual components, so that it can return the native address of the translated wasm address, so that we can spectre guard that rather than the native base.
bjorn3 commented on issue #5190:
I don't think we can guarantee a 4GB null guard. Memory may be mapped within this range, whether explicitly using mmal or because the executable has a fixed position within this range. Only 4096 bytes is guaranteed on linux as this is the default value for
vm.mmap_min_addr
.
cfallin commented on issue #5190:
If I understand the code correctly, I think we handle all of this already -- first, here we get
offset_guard_size
from the heap spec, and in the return-NULL-to-trap configuration we should have a guard size of 0 as with any dynamically-bounds-checked memory (or maybe it's a small nominal amount, I don't remember, but certainly not 4GiB). Then, here we include that offset on theheap_addr
, and it will bounds-check appropriately taking the offset into consideration.Or I guess all of this is a long way of saying: I don't think we actually lower into that CLIF if dynamic memory is configured. This is subtly different from
This becomes even more problematic when we look at the legalization of that lowered Wasm's initial clif, this time configured with dynamic memories and spectre guards
I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the translation with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.
cfallin commented on issue #5190:
(And to answer the immediate question, no, we definitely can't assume a 4GiB guard region at address 0! That would be an un-portable mess, as you observe.)
fitzgen commented on issue #5190:
I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.
No, this is generated from the original input Wasm.
fitzgen commented on issue #5190:
Then, here we include that offset on the
heap_addr
, and it will bounds-check appropriately taking the offset into consideration.This still doesn't prevent speculated execution from reading 0..4GiB though, because the generated code ultimately does
spectre_guard(base) + offset
rather than
spectre_guard(base + offset)
fitzgen edited a comment on issue #5190:
I think -- if I understand your methodology correct, you took the CLIF generated by translation with static memories enabled, then compiled it with dynamic memories enabled? With the above-linked code I think doing the _translation_ with dynamic memories enabled should result in code that does not assume a 4GiB guard at address 0.
No, each different legalization example is generated from the original input Wasm with different flags passed to Wasmtime. I was not working with just the CLIF. Always using the original Wasm as input.
cfallin commented on issue #5190:
Ah, OK, I am misreading your concern, sorry. I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong; and was very confused because the result of the
uadd_overflow_trap
was fed into theicmp
, so all seemed to account for the offset.But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing
heap_addr
with separate offset and size as you've noted.Once we do that, are there any other issues with this approach?
fitzgen commented on issue #5190:
I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;
Today, actual execution is not affected by this.
After this PR, actual execution is affected by this when dynamic memories and spectre guards are both enabled.
But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing
heap_addr
with separate offset and size as you've noted.Once we do that, are there any other issues with this approach?
Not that I am aware of.
Currently working on the
heap_addr
changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen
crates, such as the interpreter, out ofrustc
; we'll see how deep the rabbit hole goes).
fitzgen edited a comment on issue #5190:
I had thought you were saying that the actual bounds-check (the non-speculative behavior that should result in a trap) was wrong;
Today, actual execution is not affected by this.
After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.
But indeed, the select is picking the base prior to the offset addition. We should fix that by reordering the ops, as you suggest, and providing
heap_addr
with separate offset and size as you've noted.Once we do that, are there any other issues with this approach?
Not that I am aware of.
Currently working on the
heap_addr
changes. Looks like this is maybe gonna touch more than I anticipated (just got all the errors from non-cranelift-codegen
crates, such as the interpreter, out ofrustc
; we'll see how deep the rabbit hole goes).
cfallin commented on issue #5190:
Today, actual execution is not affected by this.
After this PR as currently written, actual execution is affected by this when dynamic memories and spectre guards are both enabled.
Right, sorry, makes sense now; not sure why I missed that!
fitzgen commented on issue #5190:
not sure why I missed that!
No worries, this stuff is pretty subtle, it has basically taken me all day just to write these comments and read through the code enough to understand what is happening here :-p
fitzgen commented on issue #5190:
@cfallin I've rebased this on top of
main
and the recent bounds checking related PRs I've landed and all tests are passing now and this should be good to land. Just wanted to flag you to see if you wanted to give it one last once over.
fitzgen commented on issue #5190:
Just realized @cfallin is going to be OoO for a few days, @jameysharp would you mind taking a look?
fitzgen commented on issue #5190:
It's unfortunate that the extensive comments in
cranelift/filetests/filetests/isa/x64/heap.clif
haven't been maintained since @abrown wrote them in #4841. If you have a chance to fix them up before merging this I think that'd be great, but I wouldn't block merging on that.I plan on fixing up a lot of
heap_addr
related things, I'll add this to my list.
Last updated: Jan 24 2025 at 00:11 UTC