fitzgen labeled issue #5283:
On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for
index + offset
overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is
heap_addr
. Whether we trap inheap_addr
itself or in another clif instruction afterheap_addr
is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean upheap_addr
's semantics in the face of error conditions is accidental complexity; a problem of our own making.That is not an argument against being precise with
heap_addr
semantics, however. If we haveheap_addr
, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.But... what if we didn't have a
heap_addr
instruction at all? I see a few possibilities:
CLIF already has everything needed to emit what
heap_addr
gets legalized to today, socranelift-wasm
could just emit those already-legalized instructions. And now there is no interface barrier betweenheap_addr
and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.We introduce
bounds_checked_load
andbounds_checked_store
instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; thebounds_checked_*
instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.We introduce
bounds_checked_{load,store}
instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of whatcranelift-wasm
would otherwise produce directly in option 1).The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.
The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.
The trade off between 1 and 3 is whether we want complexity in the
cranelift-wasm
frontend or incranelift-codegen
legalization.
Personally, I would rank the options in the order of 3, 1, 2.
I think we can make up for 1/3's lack of precise machine instruction control with filetests.
The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.
As for why 3 over 1? Keeping
cranelift-wasm
relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.
fitzgen opened issue #5283:
On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for
index + offset
overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is
heap_addr
. Whether we trap inheap_addr
itself or in another clif instruction afterheap_addr
is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean upheap_addr
's semantics in the face of error conditions is accidental complexity; a problem of our own making.That is not an argument against being precise with
heap_addr
semantics, however. If we haveheap_addr
, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.But... what if we didn't have a
heap_addr
instruction at all? I see a few possibilities:
CLIF already has everything needed to emit what
heap_addr
gets legalized to today, socranelift-wasm
could just emit those already-legalized instructions. And now there is no interface barrier betweenheap_addr
and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.We introduce
bounds_checked_load
andbounds_checked_store
instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; thebounds_checked_*
instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.We introduce
bounds_checked_{load,store}
instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of whatcranelift-wasm
would otherwise produce directly in option 1).The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.
The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.
The trade off between 1 and 3 is whether we want complexity in the
cranelift-wasm
frontend or incranelift-codegen
legalization.
Personally, I would rank the options in the order of 3, 1, 2.
I think we can make up for 1/3's lack of precise machine instruction control with filetests.
The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.
As for why 3 over 1? Keeping
cranelift-wasm
relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.
fitzgen labeled issue #5283:
On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for
index + offset
overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is
heap_addr
. Whether we trap inheap_addr
itself or in another clif instruction afterheap_addr
is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean upheap_addr
's semantics in the face of error conditions is accidental complexity; a problem of our own making.That is not an argument against being precise with
heap_addr
semantics, however. If we haveheap_addr
, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.But... what if we didn't have a
heap_addr
instruction at all? I see a few possibilities:
CLIF already has everything needed to emit what
heap_addr
gets legalized to today, socranelift-wasm
could just emit those already-legalized instructions. And now there is no interface barrier betweenheap_addr
and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.We introduce
bounds_checked_load
andbounds_checked_store
instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; thebounds_checked_*
instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.We introduce
bounds_checked_{load,store}
instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of whatcranelift-wasm
would otherwise produce directly in option 1).The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.
The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.
The trade off between 1 and 3 is whether we want complexity in the
cranelift-wasm
frontend or incranelift-codegen
legalization.
Personally, I would rank the options in the order of 3, 1, 2.
I think we can make up for 1/3's lack of precise machine instruction control with filetests.
The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.
As for why 3 over 1? Keeping
cranelift-wasm
relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.
fitzgen commented on issue #5283:
cc'ing folks who seemed interested in this topic in today's cranelift meeting: @cfallin @elliottt
elliottt commented on issue #5283:
Thanks for writing this up! I think you're right here, and I agree with your ordering. 3 seems like a good approach to me, though I would mildly prefer 1 as it's not adding additional instructions. As we discussed in a DM, adding a mechanism to mark instructions as expected to be legalized would address my concerns.
cfallin commented on issue #5283:
@fitzgen and I just talked about this at some length and I also agree that rewriting from the "wasm load" semantic unit-of-work directly to lower-level instructions is a good approach.
We also talked about the handling of the index+offset computation (what is currently handled by
uadd_overflow_trap
in the legalization fromheap_addr
) and came the conclusion that it probably makes sense to stick withuadd_overflow_trap
, rather thaniadd_cout
with the carry-out feeding into a select. This is for a few reasons: (i) we'd need to feed the condition into the final select (after the addition of the heap base), not an intermediate select, and this implies reifying the carry out of the add and greater-than-or-equal condition out of the bounds check as bools and or'ing them together, which is worse codegen then we have today; and (ii) we don't actually need a Spectre-guard-style select on the result of the add-overflow check, because an overflow of index+offset (that nevertheless stays within the sandbox bounds) will not see another heap, even speculatively; it'll just see part of the current heap.So basically, (i) keep what we ultimately legalize into today, but (ii) do it without
heap_addr
in the middle, by either legalizing fromheap_load
or de-facto legalizing incranelift-wasm
. I sort of prefer the former as long as we have a concept ofheap
entities, and removing those and moving the complexity tocranelift-wasm
seems like a lot of unnecessary work (and we get useful checking of its implementation by having it at the CLIF level, because of the interpreter's implementation of it).
fitzgen commented on issue #5283:
After implementing (the easy) half of (3) and talking it over with @cfallin I am now leaning towards pivoting to (1), that is removing the concept of heaps from CLIF altogether and doing the legalization immediately inside
cranelift-wasm
.The reasons are that (3) involves making heap versions of a bunch of instructions we already have in CLIF: the extending loads and stores like
uload8
et al, the SIMD versions of those likeuload8x8
et al, and various atomic instructions.uload8
et al could be replaced with backend rules for matching(uextend.i64 (load.i8 ...))
patterns but this is a bunch of churn for little gain and also requires answering questions like "how do we generalizeSinkableLoad
to all of our backends?". It is a little bit of a rabbit hole in its own right.(1) would not require removing, replacing, or duplicating any CLIF instructions other than the
heap_*
instructions which are directly related to CLIF heaps. That's nice. Furthermore, we avoid having additional "temporary" CLIF instructions that exist only momentarily at the start of compilation before being legalized away, never to be seen by a backend. Also aesthetically pleasing.Will dig into this a bit and report back.
fitzgen closed issue #5283:
On error condition, does this instruction itself trap or return an address that will trap when accessed? Currently that depends on arbitrary bits of the heap configuration and memory being accessed, and is different for
index + offset
overflow vs out of bounds. I've been improving the situation a little by making it more consistent, but it is worth backing up a bit and gaining context.The actual problem we are solving is that of wasm loads and stores, which must trap when out of bounds. At some point long ago, we decomposed wasm memory accesses into smaller parts, one of which is
heap_addr
. Whether we trap inheap_addr
itself or in another clif instruction afterheap_addr
is irrelevant to the fact that the wasm memory access our instruction sequence is implementing must trap when out of bounds. So I think trying to pin down and clean upheap_addr
's semantics in the face of error conditions is accidental complexity; a problem of our own making.That is not an argument against being precise with
heap_addr
semantics, however. If we haveheap_addr
, we should have precise semantics for it. The same way that we should aspire to giving precise (formal!) semantics to all CLIF instructions.But... what if we didn't have a
heap_addr
instruction at all? I see a few possibilities:
CLIF already has everything needed to emit what
heap_addr
gets legalized to today, socranelift-wasm
could just emit those already-legalized instructions. And now there is no interface barrier betweenheap_addr
and the other CLIF instructions surrounding it where we have to define semantics and decide which side of the interface is responsible for actually doing the trap when we encounter an error condition.We introduce
bounds_checked_load
andbounds_checked_store
instructions that match wasm semantics. They would take a static heap immediate, a dynamic index operand, a static offset immediate, and have a static type parameter that is being loaded/stored. We would carry these instructions all the way through Cranelift to the backends, where they would get lowered into precise machine instruction sequences. Again, there is no artificial interface barrier where we have to decide which side of the interface is responsible for trapping; thebounds_checked_*
instructions implement the whole of the Wasm semantics and will trap on OOB/overflow.We introduce
bounds_checked_{load,store}
instructions as above, but we legalize them early in Cranelift's pipeline to more primitive CLIF instructions (equivalent of whatcranelift-wasm
would otherwise produce directly in option 1).The primary benefits of 1 and 3 are that by decomposing the memory accesses into smaller parts, we can GVN/LICM/etc those smaller parts.
The primary benefits of 2 are that we have fewer CLIF instructions to process and so compilation times should be a little better, and that we have very precise control over the machine instructions emitted. However, we can't easily GVN/LICM/etc subcomponents of a wasm load like loading the memory base and length out of the vmctx.
The trade off between 1 and 3 is whether we want complexity in the
cranelift-wasm
frontend or incranelift-codegen
legalization.
Personally, I would rank the options in the order of 3, 1, 2.
I think we can make up for 1/3's lack of precise machine instruction control with filetests.
The compilation time benefits of a macro instruction are tempting, but I think trading that away for code quality is the correct move in this scenario, since we stille have many other avenues for improving compilation times (as we have been doing a lot recently!) that don't sacrifice code quality at all.
As for why 3 over 1? Keeping
cranelift-wasm
relatively simple seems like the way to go for me, and legalization seems like the natural home for this kind of thing.
Last updated: Dec 23 2024 at 12:05 UTC