wasmtime / issue #6039 Chaos mode MVP: Skip branch optimi... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #6039 Chaos mode MVP: Skip branch optimi...

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:36):

Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?

I suspect that might be the case. The icache fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.

In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)

For icache it would be nice to reset the bytes on the second recompile so that we make the same decisions along the way.

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:37):

afonso360 edited a comment on issue #6039:

Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?

I suspect that might be the case. The icache fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.

In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)

For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:38):

afonso360 edited a comment on issue #6039:

Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?

I suspect that might be the case. The icache fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and then recompiling and checking if the cache key matches or not.

In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)

For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.

Wasmtime GitHub notifications bot (Mar 17 2023 at 18:20):

cfallin commented on issue #6039:

Thanks so much for making a start on this problem -- the infrastructure will surely pay off in a bunch of ways!

I've read over most of this prototype; but before a detailed line-by-line review, I wanted to offer some high-level design feedback. I think it might be related to (or rather, might address) the issue with the icache fuzzer above too.

I'd prefer if we actually generalize just a little bit more and call the struct that is threaded through the compilation pipeline the ControlPlane; this abstracts over randomized decisions (chaos mode) but also optimization fuel and maybe user-directed optimization instructions and the like later. So, would you mind renaming both the types and the crate itself (maybe cranelift-control, though bikeshed comments welcome here)?

In general, we want to avoid unsafe code as much as possible. The "self-referential struct trick" wherein there's a Vec and then there's an Unstructured that holds a borrow to that Vec's contents, transmuted to erase the lifetime and get a &'static, is simply too unsafe for us to include in Cranelift code...

... fortunately, though, I suspect it's not really necessary. I suspect that much of the trouble comes from trying to hold ownership of the ChaosEngine in various places, rather than threading it through the stack as needed. I suspect it should be possible, in principle to have a ControlPlane such that:

We have &mut self methods that ask it for decisions;

We pass a &mut ControlPlane into toplevel compilation entry points, and down the stack as necessary;

If we need to (e.g. when building the Lower context), we can store a &'a mut ControlPlane, but that 'a is internal to the compiler, and not exposed to the top-level entry point.

This ControlPlane should itself not have any lifetime parameters, ideally. I suspect that means we don't actually hold an Unstructured, since it expects to borrow data owned elsewhere; rather for now we can implement our own sort of equivalent, where we keep e.g. a Vec<bool> and take decisions off the back of it. (e.g., self.chaos_decisions.pop().unwrap_or(false) or something similar.)

We can then impl Arbitrary for ControlPlane, perhaps just the automatic derivation, and use this when fuzzing: take an arbitrary Function and compile it with an arbitrary ControlPlane.

The sharing of the ChaosEngine via an Arc, and the Mutex and such, are another red flag and warning that we're going down a nondeterministic path. I suspect this is a contributing factor to the icache fuzzer issues above. The issue is that it allows different parts of the compiler to pull random choices in some potentially unordered way, potentially across threads, and that gives up reproducibility. Instead, we should have one ControlPlane per function, because that is the unit of compilation in Cranelift. At the fuzzer level, we can easily ask for a Vec<ControlPlane>; that automatically impls Arbitrary if ControlPlane does.

Does that make some sense at least? Basically, I want to push this all toward a more idiomatic Rust ownership model, which I think will have the side-effect of removing nondeterminism and making compilation a true pure function of a "control" input for one function body.

Wasmtime GitHub notifications bot (Mar 17 2023 at 22:11):

remlse commented on issue #6039:

Thank you both for the great feedback!

I totally agree that a more idiomatic Rust ownership model would be much better. I remember starting out with mutable references and running into compiler errors about mutable aliasing. I was concerned there could be way too many of these issues while threading the control plane through every code path, and I didn't know how difficult it would be. I probably was scared away too easily. I'll have another go at implementing it that way.

I understand that the use of unsafe for storing an Unstructured is a non-starter. The nice thing about it would've been that we get all the functionality of arbitrary for free and the API of the control plane would be very flexible with little effort. I'm still feeling a bit unsure about taking the manual approach. Not primarily because it would be more work, but because I expect the work would in many cases just be reimplementing Unstructured. And probably even in a worse way: Casually reading the source code of arbitrary, I'm seeing comments like this:
// Take lengths from the end of the data, since the `libFuzzer` folks
// found that this lets fuzzers more efficiently explore the input
// space.
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?

I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:

Wasmtime GitHub notifications bot (Mar 17 2023 at 23:08):

cfallin commented on issue #6039:

Yeah, that's fair -- I suppose it's not the worst thing in the world to take a ctrl_plane: &mut ControlPlane<'_> everywhere and then keep an Unstructured inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!

Wasmtime GitHub notifications bot (Mar 18 2023 at 10:42):

remlse commented on issue #6039:

We have &mut self methods that ask it for decisions

Another question that came up while we tried that at the beginning was about the size of this. A simple test indicates that references to zero-sized types are not optimized away:
struct Foo;
println!("{}", std::mem::size_of::<&Foo>()); // prints 8...
Am I missing something with that? How can we be certain that the performance of release builds is not affected?

Wasmtime GitHub notifications bot (Mar 18 2023 at 13:10):

bjorn3 commented on issue #6039:

Instead of passing &mut ControlPlane<'_> you could pass ControlPlane<'_> and add a as_mut() method which takes &mut self and returns another ControlPlane<'_> which can be used to pass ControlPlane<'_> to multiple functions. Then ControlPlane<'_> could be a mutable reference during fuzzing and a ZST when not fuzzing.

Wasmtime GitHub notifications bot (Mar 18 2023 at 16:01):

remlse commented on issue #6039:

I think I'm misunderstanding something, the function signature you're describing looks like this, right? (ignoring the lifetimes)
impl ControlPlane {
    fn as_mut(&mut self) -> Self {
        // ?
    }
}
I don't know how this function can be written without reference counting?

Wasmtime GitHub notifications bot (Mar 18 2023 at 17:59):

bjorn3 commented on issue #6039:

You did do something like

struct ControlPlane<'a>(&'a mut ControlPlaneInner);

impl ControlPlane<'_> {
    fn as_mut(&mut self) -> ControlPlane<'_> {
        ControlPlane(/*this does an implicit reborrow*/self.0)
    }
}

And then use it like

fn example(control_plane: ControlPlane<'_>) {
    foo(control_plane.as_mut());
    bar(control_plane.as_mut());
    baz(control_plane);
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:00):

bjorn3 edited a comment on issue #6039:

You did do something like

struct ControlPlane<'a>(&'a mut ControlPlaneInner);

impl ControlPlane<'_> {
    /// Reborrow `ControlPlane`.
    fn as_mut(&mut self) -> ControlPlane<'_> {
        ControlPlane(/*this does an implicit reborrow*/self.0)
    }
}

And then use it like

fn example(control_plane: ControlPlane<'_>) {
    foo(control_plane.as_mut());
    bar(control_plane.as_mut());
    baz(control_plane);
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:01):

bjorn3 edited a comment on issue #6039:

You did do something like

struct ControlPlane<'a>(&'a mut ControlPlaneInner);

impl ControlPlane<'_> {
    /// Reborrow `ControlPlane`.
    fn as_mut(&mut self) -> ControlPlane<'_> {
        ControlPlane(/*this does an implicit reborrow*/self.0)
        // Equivalent to:
        // ControlPlane(&mut *self.0)
    }
}

And then use it like

fn example(control_plane: ControlPlane<'_>) {
    foo(control_plane.as_mut());
    bar(control_plane.as_mut());
    baz(control_plane);
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:53):

cfallin commented on issue #6039:

I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?

I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:

Yeah, that's fair -- I suppose it's not the worst thing in the world to take a ctrl_plane: &mut ControlPlane<'_> everywhere and then keep an Unstructured inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!

I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific Unstructured everywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.

I think I want the design to follow these principles:

Compilation output of some CLIF is a function of that CLIF, compiler settings, and a ControlPlane composed of Rust primitives (bools, later a fuel counter, etc);

The fuzzing infrastructure can create these ControlPlanes with an Arbitrary impl.

This has a few nice properties:

It avoids lifetime proliferation.

It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level Unstructured for a Vec<bool> per function than it is to somehow try to split the random input into chunks.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

It means that we don't have to propagate "not enough data" errors upward from Unstructured.

It means that we could later drive the mechanism with some other source of decisions; we're not tied to arbitrary.

Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain data. That leads to less friction with Rust's ownership model as well as more determinism and control.

IMHO the "reimplement Arbitrary" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop a bool off a Vec" approach, and likewise for other data types we need. It's true that the fuzzer may give us only N bools in our Vec<bool>, and a particular compilation path asks for M > N bools (the last M - N of which are just defaulted to false); but libFuzzer is generally good about learning the structure of data that leads to better coverage.

Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter &mut ControlPlane; as long as its methods in a chaos-disabled build return constants, and the ControlPlane itself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one live usize-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.

Wasmtime GitHub notifications bot (Mar 18 2023 at 19:06):

cfallin edited a comment on issue #6039:

I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?

I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:

Yeah, that's fair -- I suppose it's not the worst thing in the world to take a ctrl_plane: &mut ControlPlane<'_> everywhere and then keep an Unstructured inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!

I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific Unstructured everywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.

I think I want the design to follow these principles:

Compilation output of some CLIF is a function of that CLIF, compiler settings, and a ControlPlane composed of Rust primitives (bools, later a fuel counter, etc);

The fuzzing infrastructure can create these ControlPlanes with an Arbitrary impl.

This has a few nice properties:

It avoids lifetime proliferation.

It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level Unstructured for a Vec<bool> per function than it is to somehow try to split the random input into chunks.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

It means that we don't have to propagate "not enough data" errors upward from Unstructured.

It means that we could later drive the mechanism with some other source of decisions; we're not tied to arbitrary.

Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain (structured) data. That leads to less friction with Rust's ownership model as well as more determinism and control.

IMHO the "reimplement Arbitrary" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop a bool off a Vec" approach, and likewise for other data types we need. It's true that the fuzzer may give us only N bools in our Vec<bool>, and a particular compilation path asks for M > N bools (the last M - N of which are just defaulted to false); but libFuzzer is generally good about learning the structure of data that leads to better coverage.

Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter &mut ControlPlane; as long as its methods in a chaos-disabled build return constants, and the ControlPlane itself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one live usize-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.

Wasmtime GitHub notifications bot (Mar 19 2023 at 13:08):

remlse commented on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.

non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>:

approaches:

[ ] Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.

[ ] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 19 2023 at 13:09):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>:

approaches:

[ ] Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.

[ ] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 19 2023 at 14:49):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

[ ] Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.

[ ] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 19 2023 at 14:50):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also a bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

[ ] Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.

[ ] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 19 2023 at 15:29):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also a bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[x] https://github.com/remlse/wasmtime/pull/3

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

[ ] Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.

[ ] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 19 2023 at 18:50):

cfallin commented on issue #6039:

I would prefer if we went with a &mut ControlPlane as well, rather than subverting the ownership system with a Rc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?

In addition to the disadvantages you named, using a RefCell will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures !Send which is problematic for our internal compilation parallelism.

But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of design choice, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a &mut around" and I don't really see the downsides. In particular, this statement

If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The Rc<RefCell<T>> pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.

The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a &mut context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with &mut references in general? Or something else?

Anyway, given all that, I would really strongly prefer the suggestions I gave above: a &mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.

Wasmtime GitHub notifications bot (Mar 19 2023 at 18:54):

cfallin edited a comment on issue #6039:

I would prefer if we went with a &mut ControlPlane as well, rather than subverting the ownership system with a Rc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?

In addition to the disadvantages you named, using a RefCell will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures !Send which is problematic for our internal compilation parallelism.

But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a &mut around" and I don't really see the downsides. In particular, this statement

If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The Rc<RefCell<T>> pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.

The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a &mut context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with &mut references in general? Or something else?

Anyway, given all that, I would really strongly prefer the suggestions I gave above: a &mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.

Wasmtime GitHub notifications bot (Mar 19 2023 at 19:30):

iximeow commented on issue #6039:

just in the interest of being explicit: i think part of the feedback here is, _if there is an overhead for passing &mut ControlPlane around, it is overhead Cranelift would need to deal with at some point in the future anyway, for other reasons [like user-directed opt hints or optimization fuel], so that definitely won't be held against your MVP here_. doing the simple thing with &mut ControlPlane is Probably Fine, and the relative benefit of having Chaos mode in the compiler almost certainly outweighs a small compile time regression if one is measurable.

(also, i think the trick about holding a &'a mut ControlPlaneInner actually ensures that ControlPlane is _always_ a usize - a ref of a zero-size type is still a reference and still a usize wide. this example might help show which changes are actually resulting in optimizations: rustc deletes the ControlPlane argument when it's not needed, even if it is non-zero-size, but it turns out that if foo/bar/baz are left non-side-effectful they aren't useful to demonstrate the calling convention changes anyway!

this happens to make me think that passing around a &mut ControlPlane promises to have no cost - rather than "not enough to care" cost - in the chaos mode is disabled case after all :D)

Wasmtime GitHub notifications bot (Mar 19 2023 at 19:46):

cfallin edited a comment on issue #6039:

I would prefer if we went with a &mut ControlPlane as well, rather than subverting the ownership system with a Rc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?

In addition to the disadvantages you named, using a RefCell will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures !Send which is problematic for our internal compilation parallelism.

But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a &mut around" and I don't really see the downsides. In particular, this statement

If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The Rc<RefCell<T>> pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondeterminism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.

The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a &mut context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with &mut references in general? Or something else?

Anyway, given all that, I would really strongly prefer the suggestions I gave above: a &mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.

Wasmtime GitHub notifications bot (Mar 23 2023 at 17:18):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also a bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[x] https://github.com/remlse/wasmtime/pull/3

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

~~Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.~~

[x] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Mar 23 2023 at 19:10):

remlse commented on issue #6039:

The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target cranelift-fuzzgen compiles and works to validate the architecture.

Wasmtime GitHub notifications bot (Mar 23 2023 at 19:27):

remlse edited a comment on issue #6039:

The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target cranelift-fuzzgen compiles (with feature chaos) and works to validate the architecture.

Wasmtime GitHub notifications bot (Apr 03 2023 at 17:15):

remlse commented on issue #6039:

(Sorry about the rebase, habits are stubborn. I'll solve the conflicts with a merge commit next time.)

Wasmtime GitHub notifications bot (Apr 03 2023 at 18:17):

remlse edited a comment on issue #6039:

The rename seems uncontroversial, so I did that right on this branch.

crate cranelift-chaos renamed -> cranelift-control

ChaosEngine renamed -> ControlPlane

[ ] should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also a bad idea.

approaches:

[x] https://github.com/remlse/wasmtime/pull/2

[x] https://github.com/remlse/wasmtime/pull/3

Edit: Currently, the internal representation is a Vec<bool> without using Unstructured. When it is useful to do so, we can use it in the future, internally, and without leaking any lifetimes. As an implementation detail, it's easy to change.

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

~~Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.~~

[x] Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

It is error prone because users of &Foo may erroneously assume the state of Foo not to change.

It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:29):

cfallin commented on issue #6039:

As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on main a little more than the two-week minimum with the beta-to-release delay.

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:30):

cfallin edited a comment on issue #6039:

As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on main a little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions,e tc).

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:30):

cfallin edited a comment on issue #6039:

As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on main a little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions, etc).

Wasmtime GitHub notifications bot (Apr 05 2023 at 16:13):

cfallin commented on issue #6039:

@remlse it looks like we'll need to add the new crate to a list in scripts/publish.rs; this was only caught in the full CI run in the merge queue. (If you want to test in your commit to fix this, you can add prtest:full to the commit message.)

Wasmtime GitHub notifications bot (Apr 05 2023 at 19:11):

remlse commented on issue #6039:

@cfallin the full tests are passing now, maybe we can try to merge again?

Last updated: Apr 18 2025 at 05:03 UTC

Stream: git-wasmtime

Topic: wasmtime / issue #6039 Chaos mode MVP: Skip branch optimi...

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:36):

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:37):

Wasmtime GitHub notifications bot (Mar 17 2023 at 10:38):

Wasmtime GitHub notifications bot (Mar 17 2023 at 18:20):

Wasmtime GitHub notifications bot (Mar 17 2023 at 22:11):

Wasmtime GitHub notifications bot (Mar 17 2023 at 23:08):

Wasmtime GitHub notifications bot (Mar 18 2023 at 10:42):

Wasmtime GitHub notifications bot (Mar 18 2023 at 13:10):

Wasmtime GitHub notifications bot (Mar 18 2023 at 16:01):

Wasmtime GitHub notifications bot (Mar 18 2023 at 17:59):

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:00):

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:01):

Wasmtime GitHub notifications bot (Mar 18 2023 at 18:53):

Wasmtime GitHub notifications bot (Mar 18 2023 at 19:06):

Wasmtime GitHub notifications bot (Mar 19 2023 at 13:08):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>:

Wasmtime GitHub notifications bot (Mar 19 2023 at 13:09):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>:

Wasmtime GitHub notifications bot (Mar 19 2023 at 14:49):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

Wasmtime GitHub notifications bot (Mar 19 2023 at 14:50):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

Wasmtime GitHub notifications bot (Mar 19 2023 at 15:29):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

Wasmtime GitHub notifications bot (Mar 19 2023 at 18:50):

Wasmtime GitHub notifications bot (Mar 19 2023 at 18:54):

Wasmtime GitHub notifications bot (Mar 19 2023 at 19:30):

Wasmtime GitHub notifications bot (Mar 19 2023 at 19:46):

Wasmtime GitHub notifications bot (Mar 23 2023 at 17:18):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

Wasmtime GitHub notifications bot (Mar 23 2023 at 19:10):

Wasmtime GitHub notifications bot (Mar 23 2023 at 19:27):

Wasmtime GitHub notifications bot (Apr 03 2023 at 17:15):

Wasmtime GitHub notifications bot (Apr 03 2023 at 18:17):

Usage of unsafe vs. lifetime proliferation to make use of Unstructured

Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:29):

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:30):

Wasmtime GitHub notifications bot (Apr 04 2023 at 21:30):

Wasmtime GitHub notifications bot (Apr 05 2023 at 16:13):

Wasmtime GitHub notifications bot (Apr 05 2023 at 19:11):

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`:

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`:

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`

Usage of `unsafe` vs. lifetime proliferation to make use of `Unstructured`

Non-deterministic order of perturbations because of multi-threading combined with `Arc<Mutex<_>>`