wasmtime / issue #5283 Cranelift: reconsider the `heap_ad... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5283 Cranelift: reconsider the `heap_ad...

Wasmtime GitHub notifications bot (Nov 16 2022 at 19:14):

fitzgen labeled issue #5283:

On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for index + offset overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.

The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is heap_addr. Whether we trap in heap_addr itself or in another clif instruction after heap_addr is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean up heap_addr's semantics in the face of error conditions is accidental complexity; a problem of our own making.

That is not an argument against being precise with heap_addr semantics, however. If we have heap_addr, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.

But... what if we didn't have a heap_addr instruction at all? I see a few possibilities:

CLIF already has everything needed to emit what heap_addr gets legalized to today, so cranelift-wasm could just emit those already-legalized instructions. And now there is no interface barrier between heap_addr and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.

We introduce bounds_checked_load and bounds_checked_store instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; the bounds_checked_* instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.

We introduce bounds_checked_{load,store} instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of what cranelift-wasm would otherwise produce directly in option 1).

The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.

The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.

The trade off between 1 and 3 is whether we want complexity in the cranelift-wasm frontend or in cranelift-codegen legalization.

Personally, I would rank the options in the order of 3, 1, 2.

I think we can make up for 1/3's lack of precise machine instruction control with filetests.

The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.

As for why 3 over 1? Keeping cranelift-wasm relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.

Wasmtime GitHub notifications bot (Nov 16 2022 at 19:14):

fitzgen opened issue #5283:

On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for index + offset overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.

The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is heap_addr. Whether we trap in heap_addr itself or in another clif instruction after heap_addr is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean up heap_addr's semantics in the face of error conditions is accidental complexity; a problem of our own making.

That is not an argument against being precise with heap_addr semantics, however. If we have heap_addr, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.

But... what if we didn't have a heap_addr instruction at all? I see a few possibilities:

CLIF already has everything needed to emit what heap_addr gets legalized to today, so cranelift-wasm could just emit those already-legalized instructions. And now there is no interface barrier between heap_addr and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.

We introduce bounds_checked_load and bounds_checked_store instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; the bounds_checked_* instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.

We introduce bounds_checked_{load,store} instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of what cranelift-wasm would otherwise produce directly in option 1).

The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.

The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.

The trade off between 1 and 3 is whether we want complexity in the cranelift-wasm frontend or in cranelift-codegen legalization.

Personally, I would rank the options in the order of 3, 1, 2.

I think we can make up for 1/3's lack of precise machine instruction control with filetests.

The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.

As for why 3 over 1? Keeping cranelift-wasm relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.

Wasmtime GitHub notifications bot (Nov 16 2022 at 19:14):

fitzgen labeled issue #5283:

On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for index + offset overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.

The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is heap_addr. Whether we trap in heap_addr itself or in another clif instruction after heap_addr is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean up heap_addr's semantics in the face of error conditions is accidental complexity; a problem of our own making.

That is not an argument against being precise with heap_addr semantics, however. If we have heap_addr, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.

But... what if we didn't have a heap_addr instruction at all? I see a few possibilities:

CLIF already has everything needed to emit what heap_addr gets legalized to today, so cranelift-wasm could just emit those already-legalized instructions. And now there is no interface barrier between heap_addr and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.

We introduce bounds_checked_load and bounds_checked_store instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; the bounds_checked_* instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.

We introduce bounds_checked_{load,store} instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of what cranelift-wasm would otherwise produce directly in option 1).

The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.

The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.

The trade off between 1 and 3 is whether we want complexity in the cranelift-wasm frontend or in cranelift-codegen legalization.

Personally, I would rank the options in the order of 3, 1, 2.

I think we can make up for 1/3's lack of precise machine instruction control with filetests.

The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.

As for why 3 over 1? Keeping cranelift-wasm relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.

Wasmtime GitHub notifications bot (Nov 16 2022 at 19:16):

fitzgen commented on issue #5283:

cc'ing folks who seemed interested in this topic in today's cranelift meeting: @cfallin @elliottt

Wasmtime GitHub notifications bot (Nov 16 2022 at 19:49):

elliottt commented on issue #5283:

Thanks for writing this up! I think you're right here, and I agree with your ordering. 3 seems like a good approach to me, though I would mildly prefer 1 as it's not adding additional instructions. As we discussed in a DM, adding a mechanism to mark instructions as expected to be legalized would address my concerns.

Wasmtime GitHub notifications bot (Nov 16 2022 at 23:19):

cfallin commented on issue #5283:

@fitzgen and I just talked about this at some length and I also agree that rewriting from the "wasm load" semantic unit-of-work directly to lower-level instructions is a good approach.

We also talked about the handling of the index+offset computation (what is currently handled by uadd_overflow_trap in the legalization from heap_addr) and came the conclusion that it probably makes sense to stick with uadd_overflow_trap, rather than iadd_cout with the carry-out feeding into a select. This is for a few reasons: (i) we'd need to feed the condition into the final select (after the addition of the heap base), not an intermediate select, and this implies reifying the carry out of the add and greater-than-or-equal condition out of the bounds check as bools and or'ing them together, which is worse codegen then we have today; and (ii) we don't actually need a Spectre-guard-style select on the result of the add-overflow check, because an overflow of index+offset (that nevertheless stays within the sandbox bounds) will not see another heap, even speculatively; it'll just see part of the current heap.

So basically, (i) keep what we ultimately legalize into today, but (ii) do it without heap_addr in the middle, by either legalizing from heap_load or de-facto legalizing in cranelift-wasm. I sort of prefer the former as long as we have a concept of heap entities, and removing those and moving the complexity to cranelift-wasm seems like a lot of unnecessary work (and we get useful checking of its implementation by having it at the CLIF level, because of the interpreter's implementation of it).

Wasmtime GitHub notifications bot (Dec 01 2022 at 00:41):

fitzgen commented on issue #5283:

After implementing (the easy) half of (3) and talking it over with @cfallin I am now leaning towards pivoting to (1), that is removing the concept of heaps from CLIF altogether and doing the legalization immediately inside cranelift-wasm.

The reasons are that (3) involves making heap versions of a bunch of instructions we already have in CLIF: the extending loads and stores like uload8 et al, the SIMD versions of those like uload8x8 et al, and various atomic instructions. uload8 et al could be replaced with backend rules for matching (uextend.i64 (load.i8 ...)) patterns but this is a bunch of churn for little gain and also requires answering questions like "how do we generalize SinkableLoad to all of our backends?". It is a little bit of a rabbit hole in its own right.

(1) would not require removing, replacing, or duplicating any CLIF instructions other than the heap_* instructions which are directly related to CLIF heaps. That's nice. Furthermore, we avoid having additional "temporary" CLIF instructions that exist only momentarily at the start of compilation before being legalized away, never to be seen by a backend. Also aesthetically pleasing.

Will dig into this a bit and report back.

Wasmtime GitHub notifications bot (Dec 15 2022 at 00:26):

fitzgen closed issue #5283:

On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for index + offset overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.

The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is heap_addr. Whether we trap in heap_addr itself or in another clif instruction after heap_addr is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean up heap_addr's semantics in the face of error conditions is accidental complexity; a problem of our own making.

That is not an argument against being precise with heap_addr semantics, however. If we have heap_addr, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.

But... what if we didn't have a heap_addr instruction at all? I see a few possibilities:

CLIF already has everything needed to emit what heap_addr gets legalized to today, so cranelift-wasm could just emit those already-legalized instructions. And now there is no interface barrier between heap_addr and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.

We introduce bounds_checked_load and bounds_checked_store instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; the bounds_checked_* instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.

We introduce bounds_checked_{load,store} instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of what cranelift-wasm would otherwise produce directly in option 1).

The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.

The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.

The trade off between 1 and 3 is whether we want complexity in the cranelift-wasm frontend or in cranelift-codegen legalization.

Personally, I would rank the options in the order of 3, 1, 2.

I think we can make up for 1/3's lack of precise machine instruction control with filetests.

The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.

As for why 3 over 1? Keeping cranelift-wasm relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.

Last updated: Apr 17 2025 at 12:05 UTC