fitzgen opened issue #13112:
Random thought while reading this, shouldn't
notrapandalignedbe absent here? Given the sandboxing strategy I'd expect this to not be asserted to be aligned and additionally would be allowed to trap_Originally posted by @alexcrichton in https://github.com/bytecodealliance/wasmtime/pull/13107#discussion_r3088485306_
fitzgen added the wasm-proposal:gc label to Issue #13112.
fitzgen commented on issue #13112:
What I'm worried about is UB-of-sorts where we're telling Craneilft that this load is always aligned and never traps and then at runtime, assuming there's a bug in either Cranelift or Wasmtime's GC, that's violated (in theory causing UB). I'm wary to bucket this under having a known set of possible outcomes because we're effectively violating a core assumption and I'm not sure we can enumerate all the outcomes. By analogy, the vec OOB isn't UB to hit the #[cold] block naturally, but here I'm worried that it would be UB somehow to hit a trap here.
I could see this argument for
aligned.I don't think it applies to
notrapthough.
cfallin commented on issue #13112:
I guess there's a question of what we intend
notrapandalignedto mean. We document them here as:
aligned: "...If the aligned flag is set, the instruction is permitted to trap or return a wrong result if the effective address is misaligned."notrap: right now we say "If this returns true then the memory is accessible, which means that accesses will not trap. This makes it possible to delete an unused load or a dead store instruction."So by the docs, treating Cranelift as opaque, I think we're actually safe to use
alignedtoday (because at worst, we define that the load/store traps or returns a wrong result for a store; neither of those is UB or propagates beyond the intended sandbox). But we are not safe to usenotraptoday if we want to be robust to accidental sandbox violations because we simply say that the IR asserts that a trap will not occur; we don't say what happens if it does.We could define
notrapmore precisely to mean "definitely will not have trap metadata, and may or may not cause a SIGSEGV, and if it does, may or may not occur at exactly the right point; if it does SEGV, it will do so at the given address". That permits all of our intended optimizations (code motion, dead load/store removal), and is still constrained enough that we can reason about that behavior interacting with the Wasmtime runtime: in particular, "may still SEGV, may not, but will not alter address" would let us be reasonably sure about holding the sandbox boundary.All that said, my take: I think we should reason about GC accesses the same way we do about Wasm guest loads/stores. I realize that's giving up optimization opportunity, but it is closer to the original intent of the idea of our GC sandboxing: we should "just" be using linear memories under the hood.
Separately, though, we should probably think about making post-trap state unobservable as an option; and if we do that, we can then (still) do dead store/load removal, and store reordering (between other sequence points like opaque hostcalls), not only for the GC heap but also for user linear memories as well.
cfallin edited a comment on issue #13112:
I guess there's a question of what we intend
notrapandalignedto mean. We document them here as:
aligned: "...If the aligned flag is set, the instruction is permitted to trap or return a wrong result if the effective address is misaligned."notrap: right now we say "If this returns true then the memory is accessible, which means that accesses will not trap. This makes it possible to delete an unused load or a dead store instruction."So by the docs, treating Cranelift as opaque, I think we're actually safe to use
alignedtoday (because at worst, we define that the load/store traps or returns a wrong result for a load; neither of those is UB or propagates beyond the intended sandbox). But we are not safe to usenotraptoday if we want to be robust to accidental sandbox violations because we simply say that the IR asserts that a trap will not occur; we don't say what happens if it does.We could define
notrapmore precisely to mean "definitely will not have trap metadata, and may or may not cause a SIGSEGV, and if it does, may or may not occur at exactly the right point; if it does SEGV, it will do so at the given address". That permits all of our intended optimizations (code motion, dead load/store removal), and is still constrained enough that we can reason about that behavior interacting with the Wasmtime runtime: in particular, "may still SEGV, may not, but will not alter address" would let us be reasonably sure about holding the sandbox boundary.All that said, my take: I think we should reason about GC accesses the same way we do about Wasm guest loads/stores. I realize that's giving up optimization opportunity, but it is closer to the original intent of the idea of our GC sandboxing: we should "just" be using linear memories under the hood.
Separately, though, we should probably think about making post-trap state unobservable as an option; and if we do that, we can then (still) do dead store/load removal, and store reordering (between other sequence points like opaque hostcalls), not only for the GC heap but also for user linear memories as well.
alexcrichton commented on issue #13112:
Personally I agree with @cfallin's conclusion of treating gc loads/stores the same as wasm loads/stores, I feel that fits our sandboxing model best. W.r.t the optimization concerns you have @fitzgen, my naive assumption is "that's what binaryen is for" or some sort of optimization pass. For example we expect LLVM-optimized-wasm to be suitable for "ok we can't move these memory opts", so I feel like we should have a similar expectation for GC-using wasm where it should be optimal coming in, not rely on Cranelift to clean up the wasm itself. To me Cranelift is responsible for primarily cleaning up Wasmtime's runtime abstractions, e.g. the base pointer of linear memory and hoisting that out, but not for cleaning up the input wasm.
tschneidereit commented on issue #13112:
For example we expect LLVM-optimized-wasm to be suitable for "ok we can't move these memory opts", so I feel like we should have a similar expectation for GC-using wasm where it should be optimal coming in
I'm not sure how well this will hold up for too much longer, fwiw, perhaps in particular for GC: there are lots of languages that already don't go through LLVM, and then there are some (including Rust!) that do go through LLVM, but not through
wasm-opt, which LLVM itself pretty much assumes to be part of its optimization pipeline.Long-term, my guess is that we'll have to add at least the most obvious optimizations that would happen in
wasm-optif used, and perhaps some of what's happening in LLVM itself, too.
Last updated: May 03 2026 at 22:13 UTC