Stream: git-wasmtime

Topic: wasmtime / issue #5950 cranelift-interpreter: Add cross s...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 08 2023 at 01:58):

jameysharp commented on issue #5950:

Hey @afonso360, does this PR work if a function uses stack_addr, then passes the address to another function, which tries to access through that pointer?

I would think is_address_within_stack_slot_range would need to search up the call stack to find which frame the address is in. But I haven't studied the interpreter enough to be sure.

I'd like to give this another review once we've answered that question. I think there are a few things we can simplify, but I want to be sure the overall plan is right before getting into the details.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 08 2023 at 06:32):

bjorn3 commented on issue #5950:

For stack_addr this check can't be done without introducing pointer provenance which would be introducing a new kind of UB in clif ir and require extra instructions to expose provenance. It is currently valid to "guess" the address of another stack slot for example by adding a specific offset to the address of one stack slot, comparing it to the address of another stack slot and then using the original offseted pointer if they compare equal. To make that UB you have to attach provenance to the pointer which indicates from which stack slot the pointer came.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 08 2023 at 10:37):

afonso360 commented on issue #5950:

Hey @afonso360, does this PR work if a function uses stack_addr, then passes the address to another function, which tries to access through that pointer?
I would think is_address_within_stack_slot_range would need to search up the call stack to find which frame the address is in. But I haven't studied the interpreter enough to be sure.

Oh! That's a good point, I think we need to do that!

For stack_addr this check can't be done without introducing pointer provenance which would be introducing a new kind of UB in clif ir and require extra instructions to expose provenance. It is currently valid to "guess" the address of another stack slot for example by adding a specific offset to the address of one stack slot, comparing it to the address of another stack slot and then using the original offseted pointer if they compare equal. To make that UB you have to attach provenance to the pointer which indicates from which stack slot the pointer came.

Right, I don't think we can catch more advanced cases like returning a stack_addr and calling a different function with a different stack layout and then using that address or something along those lines. So we are never 100% sure that this access is legal. But It should cover the majority of the cases I think.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 08 2023 at 14:02):

bjorn3 commented on issue #5950:

Even within the same function doing it on stack_addr requires pointer provenance as without pointer provenance guessing the address is allowed when you guess correctly (or checked that your guess is correct).

view this post on Zulip Wasmtime GitHub notifications bot (Mar 08 2023 at 14:03):

bjorn3 edited a comment on issue #5950:

Even within the same function doing it on stack_addr requires pointer provenance as without pointer provenance guessing the address is allowed when you guess correctly (or checked that your guess is correct). See also https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

view this post on Zulip Wasmtime GitHub notifications bot (Mar 09 2023 at 12:07):

afonso360 commented on issue #5950:

Let me double check if I understood what you are saying.

If we have an address from ss0 we can use it to write to ss1 by just adding to it the right number of bytes. Even in the same function right?

It is currently valid to "guess" the address of another stack slot for example by adding a specific offset to the address of one stack slot, comparing it to the address of another stack slot and then using the original offseted pointer if they compare equal.

This check only operates when we do a load / store we don't actually care that the address was originally from this stack slot, just that whenever the user tries to use it it doesn't do any cross stack slot access. So they can still guess, and as long as they are right we let them do it.

I think the naming of this might also be making things confusing, we are just checking that a load/store doesn't access two (or more!) stack slots at once, not that they are correct w.r.t the original address provided by stack_addr.

The reasoning for preventing the above is that the compiler is free to reorder stack slots / add alignment, so doing that is already UB.

Did I understand you correctly, or am I still missing something?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 09 2023 at 12:16):

bjorn3 commented on issue #5950:

If we have an address from ss0 we can use it to write to ss1 by just adding to it the right number of bytes. Even in the same function right?

Without pointer provenance, yes. There is nothing distinguishing a pointer created using stack_addr ss1 from one created from stack_addr ss0 that is offseted to be point at ss1. You still have to guess the right number of bytes to offset, but you can check if you guessed right within the program or calculate it based on CompiledCodeBase::sized_stackslot_offsets.

This check only operates when we do a load / store we don't actually care that the address was originally from this stack slot, just that whenever the user tries to use it it doesn't do any cross stack slot access. So they can still guess, and as long as they are right we let them do it.

I think the naming of this might also be making things confusing, we are just checking that a load/store doesn't access two (or more!) stack slots at once, not that they are correct w.r.t the original address provided by stack_addr.

I see. I thought for stack_addr this was about taking a reference, offsetting such that it points to a different stackslot and then doing a memory access. But if this is only about a single memory access part of two stack slots at the same time, that is fine I guess. Technically it would still be possible to check that two stack slots are adjacent and only do the memory access if this is the case, but I don't really think anyone would want that. C and Rust both forbid it using pointer provenance and Webassembly doesn't have access to Cranelift stackslots.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 09:49):

jan-justin commented on issue #5950:

Hey all,

Thanks for the input.

From what I can gather, what is missing here is potentially searching up the call stack for addresses that reside in another frame. However, @jameysharp you mentioned that you wanted to gather a few facts before giving this PR another review. Would you still like to have another look?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 20:59):

jameysharp commented on issue #5950:

Hey @jan-justin, thanks for checking. The main thing I wanted to know was if this PR would work with stack pointers passed between functions, and I believe the answer is that it doesn't. Before merging it I'd like to make sure that it doesn't falsely report errors in programs which are actually valid. Ideally we'd add runtests in cranelift/filetests/filetests/runtests/ with some cross between call.clif and stack-addr-*.clif verifying that this works.

On the other hand, it's okay if this fails to report errors in some programs which have bugs. If it catches some bugs without any false positives then it's an improvement. In particular, that means we don't need to start tracking pointer provenance in the interpreter before merging this PR.

So I want to see this PR search the call stack to find the stack frame which contains the address, but I don't think it needs any other big changes.

Once that's done, then I want to take another look through the PR to see if we can simplify the implementation some. But there's no reason to dive into that level of detail when it could turn out that you make big changes anyway.

Now, @bjorn3 has pointed out a higher-level issue that I want to address. @afonso360 and I both thought of this cross-slot check as being obviously a good idea, but we didn't fully tie it to any defined semantics for CLIF. So when I said just now that this should "catch bugs without any false positives", we first need a definition of what makes this a bug.

I've chatted with a couple other Cranelift developers about this, and we agree that we want any program which doesn't follow pointer provenance rules on stack slots to be considered undefined behavior. We want to allow optimizations like deciding that two pointers don't alias if they came from different stack slots.

But I don't think we've said that anywhere before, and it would be our first instance of officially undefined behavior in CLIF, so it's kind of a big deal. The current language Afonso pointed to in the StackSlot docs does seem to permit the situations that @bjorn3 described as "You still have to guess the right number of bytes to offset, but you can check if you guessed right within the program…"

In short, let's continue with this PR on the basis that, as @bjorn3 said, "I don't really think anyone would want" to break pointer provenance rules. We should formalize the rules we're actually following at some point but we can still start checking on an _ad hoc_ basis now.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 21:26):

bjorn3 commented on issue #5950:

I wouldn't want us to optimize based on ad-hoc rules (that is the root of many miscompilations), but checking based on them until we have formal rules is much less of an issue.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 23:29):

cfallin commented on issue #5950:

@jameysharp and I chatted for a bit about this just now, and had some interesting discussion around UB in general and this behavior in particular.

First I think there is a useful distinction to make between two kinds of behavior that we see as undesirable from the CLIF:

The former is truly problematic: it is embedding a detail of our current ABI/stack-frame-layout implementation into CLIF. The latter, however, is less clear, to me at least.

In particular, substituting a pointer with an expression that evaluates to that pointer, but is not that pointer, violates pointer provenance but I worry it may arise as emergent behavior when optimizations compose. p0 + (p1 - p0) seems silly, but mid-end rules that rewrite integer expressions are quite reasonable, and in general we consider these correct as long as they result in code that computes the same integer. In a pointer-provenance world, we now need to have a "may be a pointer?" analysis, and exclude any possibly-pointer SSA values from any rewrites that may "mix in" another pointer. (Alternately, we could come up with a set metarules around how we write our rules, such that no legal pointer manipulation can be rewritten into an illegal one. But my brain hurts just thinking about that.)

It also precludes some rewrites that the producer might do. For example, in a pointer-provenance world, pointers can't be stored to memory. Or if they are, we need an escape analysis and a "has escaped, could alias anything" state in addition to our "provenance is strictly tracked" state for each storage location. One might argue that a producer should be careful about this but consider: this means that we can no longer write general code like "write all arguments to memory and call a trampoline". Or even write a producer that, e.g., directly compiles stack-based bytecode to use a value stack in memory: as soon as the opcode for "get scratch space" compiles to a stack_addr and a store of that addr, it has lost its provenance and is just bytes in memory. Again, escape analysis could catch this, but it's just more complexity.

Finally, there might be legal code that can actually perform loads/stores that cross immediately adjacent slots (or at least one might want to write such code). Consider the example of a stack-concatenation routine that is vectorized. If it has a fastpath to concatenate two 8-byte strings at adjacent addresses by doing a 16-byte vector load, that should be legal beyond a conditional that verifies that property. But if we have a notion of strict pointer provenance, we lose that. And again, the subtle and tricky part isn't that we can't write overly clever code (too bad!), it's that we have to think about this extra possibility every time we think about memory accesses.

So what I'm coming around to is that allowing weird behavior like p0 + (p1 - p0) isn't the main point; the main point is having reliable abstractions with no exceptions. If a pointer is an integer, and an integer is an integer no matter what, then we should be able to apply any optimization we want to the integer code. As long as the same integer pops out in the end, it is as legal to use as the original pointer was. That allows for compositionality and modular reasoning, which both lead to better-factored and more-likely-to-be-correct code. This is kind of similar to why "referential transparency" in functional programming is so powerful: if all one has to reason about is the value itself, absent any other hidden or implicit invariants or state, the world is simpler and one can confidently (a Rustacean may even say "fearlessly"!) refactor.

All of this is in addition to the problems that undefined behavior brings to verification (need for refinement semantics rather than strict equivalence checking) and fuzzing.

Coming back to the issue at hand: I think that we should allow the second kind of behavior (accessing a valid pointer no matter how it was computed); the real issue we're trying to address is code that implicitly relies on stackslot layout. I think we can test that a different way by making chaos mode vary stackslot layout randomly, with random gaps between the slots. That requires fuzzing to hit the bug, but then, so does this PR. (We could do a cheap form of this by, as a default, doing something other than the contiguous stackslot address allocation: we could place them further apart in the interpreter's emulated address space.)

Anyway, I'm happy to discuss this further; perhaps in the next Cranelift meeting if others would like; but given all the above I personally favor avoiding the checks we have in this PR. My apologies for not catching this discussion earlier, and thank you regardless @jan-justin for thinking about this issue and proposing and prototyping this check!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 23:32):

cfallin commented on issue #5950:

One more point I forgot to include above: pointer provenance IMHO might fit better into an IR that has pointers as first-class types. The issues named above are really, seem from a certain perspective, arising because we hide magical pointer values inside unassuming integers, such that handling those integers in the wrong way blows up. If CLIF were at an abstraction level that had pointers, and something like LLVM's getelementpointer instruction, pointer provenance would feel more appropriate. But one way to justify why no pointer provenance is that we're at a lower abstraction level: we've chosen to make pointers "just integers", and so we should carry through the logical conclusions of that decision.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 00:59):

cfallin edited a comment on issue #5950:

One more point I forgot to include above: pointer provenance IMHO might fit better into an IR that has pointers as first-class types. The issues named above are really, seen from a certain perspective, arising because we hide magical pointer values inside unassuming integers, such that handling those integers in the wrong way blows up. If CLIF were at an abstraction level that had pointers, and something like LLVM's getelementpointer instruction, pointer provenance would feel more appropriate. But one way to justify why no pointer provenance is that we're at a lower abstraction level: we've chosen to make pointers "just integers", and so we should carry through the logical conclusions of that decision.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 01:01):

cfallin edited a comment on issue #5950:

@jameysharp and I chatted for a bit about this just now, and had some interesting discussion around UB in general and this behavior in particular.

First I think there is a useful distinction to make between two kinds of behavior that we see as undesirable from the CLIF:

The former is truly problematic: it is embedding a detail of our current ABI/stack-frame-layout implementation into CLIF. The latter, however, is less clear, to me at least.

In particular, substituting a pointer with an expression that evaluates to that pointer, but is not that pointer, violates pointer provenance but I worry it may arise as emergent behavior when optimizations compose. p0 + (p1 - p0) seems silly, but mid-end rules that rewrite integer expressions are quite reasonable, and in general we consider these correct as long as they result in code that computes the same integer. In a pointer-provenance world, we now need to have a "may be a pointer?" analysis, and exclude any possibly-pointer SSA values from any rewrites that may "mix in" another pointer. (Alternately, we could come up with a set metarules around how we write our rules, such that no legal pointer manipulation can be rewritten into an illegal one. But my brain hurts just thinking about that.)

It also precludes some rewrites that the producer might do. For example, in a pointer-provenance world, pointers can't be stored to memory. Or if they are, we need an escape analysis and a "has escaped, could alias anything" state in addition to our "provenance is strictly tracked" state for each storage location. One might argue that a producer should be careful about this but consider: this means that we can no longer write general code like "write all arguments to memory and call a trampoline". Or even write a producer that, e.g., directly compiles stack-based bytecode to use a value stack in memory: as soon as the opcode for "get scratch space" compiles to a stack_addr and a store of that addr, it has lost its provenance and is just bytes in memory. Again, escape analysis could catch this, but it's just more complexity.

Finally, there might be legal code that can actually perform loads/stores that cross immediately adjacent slots (or at least one might want to write such code). Consider the example of a string-concatenation routine that is vectorized. If it has a fastpath to concatenate two 8-byte strings at adjacent addresses by doing a 16-byte vector load, that should be legal beyond a conditional that verifies that property. But if we have a notion of strict pointer provenance, we lose that. And again, the subtle and tricky part isn't that we can't write overly clever code (too bad!), it's that we have to think about this extra possibility every time we think about memory accesses.

So what I'm coming around to is that allowing weird behavior like p0 + (p1 - p0) isn't the main point; the main point is having reliable abstractions with no exceptions. If a pointer is an integer, and an integer is an integer no matter what, then we should be able to apply any optimization we want to the integer code. As long as the same integer pops out in the end, it is as legal to use as the original pointer was. That allows for compositionality and modular reasoning, which both lead to better-factored and more-likely-to-be-correct code. This is kind of similar to why "referential transparency" in functional programming is so powerful: if all one has to reason about is the value itself, absent any other hidden or implicit invariants or state, the world is simpler and one can confidently (a Rustacean may even say "fearlessly"!) refactor.

All of this is in addition to the problems that undefined behavior brings to verification (need for refinement semantics rather than strict equivalence checking) and fuzzing.

Coming back to the issue at hand: I think that we should allow the second kind of behavior (accessing a valid pointer no matter how it was computed); the real issue we're trying to address is code that implicitly relies on stackslot layout. I think we can test that a different way by making chaos mode vary stackslot layout randomly, with random gaps between the slots. That requires fuzzing to hit the bug, but then, so does this PR. (We could do a cheap form of this by, as a default, doing something other than the contiguous stackslot address allocation: we could place them further apart in the interpreter's emulated address space.)

Anyway, I'm happy to discuss this further; perhaps in the next Cranelift meeting if others would like; but given all the above I personally favor avoiding the checks we have in this PR. My apologies for not catching this discussion earlier, and thank you regardless @jan-justin for thinking about this issue and proposing and prototyping this check!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 15:19):

jan-justin commented on issue #5950:

No problem at all!

I am happy to contribute, even if only to help fuel a discussion.

Considering everything that has been mentioned, however, I am left uncertain as to how to proceed from here. Should I close this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 15:51):

cfallin commented on issue #5950:

I think it would be good to see if we can get a broader consensus on the above -- right now it is just my argument for the position. If others have opinions then we can discuss here, otherwise perhaps we can confirm in the next Cranelift weekly?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 17:36):

jameysharp commented on issue #5950:

I see Chris' point, and I gather there aren't any dissenting opinions from anyone else. So I'll go ahead and close this PR.

@jan-justin, thanks so much for trying this out and for fueling a great discussion! We've discovered answers to some pretty fundamental questions about Cranelift's design, which we hadn't expressed such clear rationale for before.


Last updated: Jan 24 2025 at 00:11 UTC