afonso360 commented on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icachefuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the bytes on the second recompile so that we make the same decisions along the way.
afonso360 edited a comment on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icachefuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.
afonso360 edited a comment on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icachefuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and then recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.
cfallin commented on issue #6039:
Thanks so much for making a start on this problem -- the infrastructure will surely pay off in a bunch of ways!
I've read over most of this prototype; but before a detailed line-by-line review, I wanted to offer some high-level design feedback. I think it might be related to (or rather, might address) the issue with the
icachefuzzer above too.
- I'd prefer if we actually generalize just a little bit more and call the struct that is threaded through the compilation pipeline the
ControlPlane; this abstracts over randomized decisions (chaos mode) but also optimization fuel and maybe user-directed optimization instructions and the like later. So, would you mind renaming both the types and the crate itself (maybecranelift-control, though bikeshed comments welcome here)?- In general, we want to avoid unsafe code as much as possible. The "self-referential struct trick" wherein there's a
Vecand then there's anUnstructuredthat holds a borrow to thatVec's contents, transmuted to erase the lifetime and get a&'static, is simply too unsafe for us to include in Cranelift code...- ... fortunately, though, I suspect it's not really necessary. I suspect that much of the trouble comes from trying to hold ownership of the
ChaosEnginein various places, rather than threading it through the stack as needed. I suspect it should be possible, in principle to have aControlPlanesuch that:
- We have
&mut selfmethods that ask it for decisions;- We pass a
&mut ControlPlaneinto toplevel compilation entry points, and down the stack as necessary;- If we need to (e.g. when building the
Lowercontext), we can store a&'a mut ControlPlane, but that'ais internal to the compiler, and not exposed to the top-level entry point.- This
ControlPlaneshould itself not have any lifetime parameters, ideally. I suspect that means we don't actually hold anUnstructured, since it expects to borrow data owned elsewhere; rather for now we can implement our own sort of equivalent, where we keep e.g. aVec<bool>and take decisions off the back of it. (e.g.,self.chaos_decisions.pop().unwrap_or(false)or something similar.)- We can then impl
ArbitraryforControlPlane, perhaps just the automatic derivation, and use this when fuzzing: take an arbitraryFunctionand compile it with an arbitraryControlPlane.- The sharing of the
ChaosEnginevia anArc, and theMutexand such, are another red flag and warning that we're going down a nondeterministic path. I suspect this is a contributing factor to theicachefuzzer issues above. The issue is that it allows different parts of the compiler to pull random choices in some potentially unordered way, potentially across threads, and that gives up reproducibility. Instead, we should have oneControlPlaneper function, because that is the unit of compilation in Cranelift. At the fuzzer level, we can easily ask for aVec<ControlPlane>; that automatically implsArbitraryifControlPlanedoes.Does that make some sense at least? Basically, I want to push this all toward a more idiomatic Rust ownership model, which I think will have the side-effect of removing nondeterminism and making compilation a true pure function of a "control" input for one function body.
remlse commented on issue #6039:
Thank you both for the great feedback!
I totally agree that a more idiomatic Rust ownership model would be much better. I remember starting out with mutable references and running into compiler errors about mutable aliasing. I was concerned there could be way too many of these issues while threading the control plane through every code path, and I didn't know how difficult it would be. I probably was scared away too easily. I'll have another go at implementing it that way.
I understand that the use of
unsafefor storing anUnstructuredis a non-starter. The nice thing about it would've been that we get all the functionality ofarbitraryfor free and the API of the control plane would be very flexible with little effort. I'm still feeling a bit unsure about taking the manual approach. Not primarily because it would be more work, but because I expect the work would in many cases just be reimplementingUnstructured. And probably even in a worse way: Casually reading the source code ofarbitrary, I'm seeing comments like this:// Take lengths from the end of the data, since the `libFuzzer` folks // found that this lets fuzzers more efficiently explore the input // space.I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of
arbitrarywhatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
cfallin commented on issue #6039:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>everywhere and then keep anUnstructuredinside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!
remlse commented on issue #6039:
We have
&mut selfmethods that ask it for decisionsAnother question that came up while we tried that at the beginning was about the size of this. A simple test indicates that references to zero-sized types are not optimized away:
struct Foo; println!("{}", std::mem::size_of::<&Foo>()); // prints 8...Am I missing something with that? How can we be certain that the performance of release builds is not affected?
bjorn3 commented on issue #6039:
Instead of passing
&mut ControlPlane<'_>you could passControlPlane<'_>and add aas_mut()method which takes&mut selfand returns anotherControlPlane<'_>which can be used to passControlPlane<'_>to multiple functions. ThenControlPlane<'_>could be a mutable reference during fuzzing and a ZST when not fuzzing.
remlse commented on issue #6039:
I think I'm misunderstanding something, the function signature you're describing looks like this, right? (ignoring the lifetimes)
impl ControlPlane { fn as_mut(&mut self) -> Self { // ? } }I don't know how this function can be written without reference counting?
bjorn3 commented on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) } }And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
bjorn3 edited a comment on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { /// Reborrow `ControlPlane`. fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) } }And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
bjorn3 edited a comment on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { /// Reborrow `ControlPlane`. fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) // Equivalent to: // ControlPlane(&mut *self.0) } }And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
cfallin commented on issue #6039:
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?
I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>everywhere and then keep anUnstructuredinside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific
Unstructuredeverywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.I think I want the design to follow these principles:
- Compilation output of some CLIF is a function of that CLIF, compiler settings, and a
ControlPlanecomposed of Rust primitives (bools, later a fuel counter, etc);- The fuzzing infrastructure can create these
ControlPlanes with anArbitraryimpl.This has a few nice properties:
- It avoids lifetime proliferation.
- It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level
Unstructuredfor aVec<bool>per function than it is to somehow try to split the random input into chunks.- It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.- It means that we don't have to propagate "not enough data" errors upward from
Unstructured.- It means that we could later drive the mechanism with some other source of decisions; we're not tied to
arbitrary.Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain data. That leads to less friction with Rust's ownership model as well as more determinism and control.
IMHO the "reimplement
Arbitrary" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop abooloff aVec" approach, and likewise for other data types we need. It's true that the fuzzer may give us onlyNbools in ourVec<bool>, and a particular compilation path asks forM > Nbools (the lastM - Nof which are just defaulted tofalse); but libFuzzer is generally good about learning the structure of data that leads to better coverage.Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter
&mut ControlPlane; as long as its methods in a chaos-disabled build return constants, and theControlPlaneitself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one liveusize-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.
cfallin edited a comment on issue #6039:
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?
I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>everywhere and then keep anUnstructuredinside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific
Unstructuredeverywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.I think I want the design to follow these principles:
- Compilation output of some CLIF is a function of that CLIF, compiler settings, and a
ControlPlanecomposed of Rust primitives (bools, later a fuel counter, etc);- The fuzzing infrastructure can create these
ControlPlanes with anArbitraryimpl.This has a few nice properties:
- It avoids lifetime proliferation.
- It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level
Unstructuredfor aVec<bool>per function than it is to somehow try to split the random input into chunks.- It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.- It means that we don't have to propagate "not enough data" errors upward from
Unstructured.- It means that we could later drive the mechanism with some other source of decisions; we're not tied to
arbitrary.Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain (structured) data. That leads to less friction with Rust's ownership model as well as more determinism and control.
IMHO the "reimplement
Arbitrary" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop abooloff aVec" approach, and likewise for other data types we need. It's true that the fuzzer may give us onlyNbools in ourVec<bool>, and a particular compilation path asks forM > Nbools (the lastM - Nof which are just defaulted tofalse); but libFuzzer is generally good about learning the structure of data that leads to better coverage.Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter
&mut ControlPlane; as long as its methods in a chaos-disabled build return constants, and theControlPlaneitself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one liveusize-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.
remlse commented on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>:approaches:
- [ ] Replace with
Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>:approaches:
- [ ] Replace with
Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>approaches:
- [ ] Replace with
Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also a bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>approaches:
- [ ] Replace with
Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also a bad idea.approaches:
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>approaches:
- [ ] Replace with
Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
cfallin commented on issue #6039:
I would prefer if we went with a
&mut ControlPlaneas well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCellwill actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Sendwhich is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of design choice, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mutaround" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mutcontext parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mutreferences in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
cfallin edited a comment on issue #6039:
I would prefer if we went with a
&mut ControlPlaneas well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCellwill actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Sendwhich is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mutaround" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mutcontext parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mutreferences in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
iximeow commented on issue #6039:
just in the interest of being explicit: i think part of the feedback here is, _if there is an overhead for passing
&mut ControlPlanearound, it is overhead Cranelift would need to deal with at some point in the future anyway, for other reasons [like user-directed opt hints or optimization fuel], so that definitely won't be held against your MVP here_. doing the simple thing with&mut ControlPlaneis Probably Fine, and the relative benefit of havingChaos modein the compiler almost certainly outweighs a small compile time regression if one is measurable.(also, i think the trick about holding a
&'a mut ControlPlaneInneractually ensures thatControlPlaneis _always_ ausize- a ref of a zero-size type is still a reference and still ausizewide. this example might help show which changes are actually resulting in optimizations: rustc deletes theControlPlaneargument when it's not needed, even if it is non-zero-size, but it turns out that iffoo/bar/bazare left non-side-effectful they aren't useful to demonstrate the calling convention changes anyway!this happens to make me think that passing around a
&mut ControlPlanepromises to have no cost - rather than "not enough to care" cost - in thechaos mode is disabledcase after all :D)
cfallin edited a comment on issue #6039:
I would prefer if we went with a
&mut ControlPlaneas well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCellwill actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Sendwhich is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mutaround" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondeterminism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mutcontext parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mutreferences in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also a bad idea.approaches:
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>approaches:
Replace withRc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [x] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse commented on issue #6039:
The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target
cranelift-fuzzgencompiles and works to validate the architecture.
remlse edited a comment on issue #6039:
The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target
cranelift-fuzzgencompiles (with feature chaos) and works to validate the architecture.
remlse commented on issue #6039:
(Sorry about the rebase, habits are stubborn. I'll solve the conflicts with a merge commit next time.)
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaosrenamed ->cranelift-controlChaosEnginerenamed ->ControlPlane- [ ] should the feature also be renamed from
chaostocontrol? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafevs. lifetime proliferation to make use ofUnstructuredI agree with the feedback that
unsafemust be avoided and that spreading&'a mut ControlPlaneeverywhere is also a bad idea.approaches:
Edit: Currently, the internal representation is a
Vec<bool>without usingUnstructured. When it is useful to do so, we can use it in the future, internally, and without leaking any lifetimes. As an implementation detail, it's easy to change.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>approaches:
Replace withRc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.- [x] Propagate
&mut ControlPlanethrough the call stack instead of ownedControlPlanes. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foomay erroneously assume the state ofFoonot to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlaneshouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane, which is a piece of data that can implHashandEqlike anything else.This may be an issue though, as
HashandEqcannot be derived onRcandRefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
cfallin commented on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
maina little more than the two-week minimum with the beta-to-release delay.
cfallin edited a comment on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
maina little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions,e tc).
cfallin edited a comment on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
maina little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions, etc).
cfallin commented on issue #6039:
@remlse it looks like we'll need to add the new crate to a list in
scripts/publish.rs; this was only caught in the full CI run in the merge queue. (If you want to test in your commit to fix this, you can addprtest:fullto the commit message.)
remlse commented on issue #6039:
@cfallin the full tests are passing now, maybe we can try to merge again?
Last updated: Dec 13 2025 at 19:03 UTC