afonso360 commented on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icache
fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the bytes on the second recompile so that we make the same decisions along the way.
afonso360 edited a comment on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icache
fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and the recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.
afonso360 edited a comment on issue #6039:
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?
I suspect that might be the case. The
icache
fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and then recompiling and checking if the cache key matches or not.In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)
For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.
cfallin commented on issue #6039:
Thanks so much for making a start on this problem -- the infrastructure will surely pay off in a bunch of ways!
I've read over most of this prototype; but before a detailed line-by-line review, I wanted to offer some high-level design feedback. I think it might be related to (or rather, might address) the issue with the
icache
fuzzer above too.
- I'd prefer if we actually generalize just a little bit more and call the struct that is threaded through the compilation pipeline the
ControlPlane
; this abstracts over randomized decisions (chaos mode) but also optimization fuel and maybe user-directed optimization instructions and the like later. So, would you mind renaming both the types and the crate itself (maybecranelift-control
, though bikeshed comments welcome here)?- In general, we want to avoid unsafe code as much as possible. The "self-referential struct trick" wherein there's a
Vec
and then there's anUnstructured
that holds a borrow to thatVec
's contents, transmuted to erase the lifetime and get a&'static
, is simply too unsafe for us to include in Cranelift code...- ... fortunately, though, I suspect it's not really necessary. I suspect that much of the trouble comes from trying to hold ownership of the
ChaosEngine
in various places, rather than threading it through the stack as needed. I suspect it should be possible, in principle to have aControlPlane
such that:
- We have
&mut self
methods that ask it for decisions;- We pass a
&mut ControlPlane
into toplevel compilation entry points, and down the stack as necessary;- If we need to (e.g. when building the
Lower
context), we can store a&'a mut ControlPlane
, but that'a
is internal to the compiler, and not exposed to the top-level entry point.- This
ControlPlane
should itself not have any lifetime parameters, ideally. I suspect that means we don't actually hold anUnstructured
, since it expects to borrow data owned elsewhere; rather for now we can implement our own sort of equivalent, where we keep e.g. aVec<bool>
and take decisions off the back of it. (e.g.,self.chaos_decisions.pop().unwrap_or(false)
or something similar.)- We can then impl
Arbitrary
forControlPlane
, perhaps just the automatic derivation, and use this when fuzzing: take an arbitraryFunction
and compile it with an arbitraryControlPlane
.- The sharing of the
ChaosEngine
via anArc
, and theMutex
and such, are another red flag and warning that we're going down a nondeterministic path. I suspect this is a contributing factor to theicache
fuzzer issues above. The issue is that it allows different parts of the compiler to pull random choices in some potentially unordered way, potentially across threads, and that gives up reproducibility. Instead, we should have oneControlPlane
per function, because that is the unit of compilation in Cranelift. At the fuzzer level, we can easily ask for aVec<ControlPlane>
; that automatically implsArbitrary
ifControlPlane
does.Does that make some sense at least? Basically, I want to push this all toward a more idiomatic Rust ownership model, which I think will have the side-effect of removing nondeterminism and making compilation a true pure function of a "control" input for one function body.
remlse commented on issue #6039:
Thank you both for the great feedback!
I totally agree that a more idiomatic Rust ownership model would be much better. I remember starting out with mutable references and running into compiler errors about mutable aliasing. I was concerned there could be way too many of these issues while threading the control plane through every code path, and I didn't know how difficult it would be. I probably was scared away too easily. I'll have another go at implementing it that way.
I understand that the use of
unsafe
for storing anUnstructured
is a non-starter. The nice thing about it would've been that we get all the functionality ofarbitrary
for free and the API of the control plane would be very flexible with little effort. I'm still feeling a bit unsure about taking the manual approach. Not primarily because it would be more work, but because I expect the work would in many cases just be reimplementingUnstructured
. And probably even in a worse way: Casually reading the source code ofarbitrary
, I'm seeing comments like this:// Take lengths from the end of the data, since the `libFuzzer` folks // found that this lets fuzzers more efficiently explore the input // space.
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of
arbitrary
whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
cfallin commented on issue #6039:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>
everywhere and then keep anUnstructured
inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!
remlse commented on issue #6039:
We have
&mut self
methods that ask it for decisionsAnother question that came up while we tried that at the beginning was about the size of this. A simple test indicates that references to zero-sized types are not optimized away:
struct Foo; println!("{}", std::mem::size_of::<&Foo>()); // prints 8...
Am I missing something with that? How can we be certain that the performance of release builds is not affected?
bjorn3 commented on issue #6039:
Instead of passing
&mut ControlPlane<'_>
you could passControlPlane<'_>
and add aas_mut()
method which takes&mut self
and returns anotherControlPlane<'_>
which can be used to passControlPlane<'_>
to multiple functions. ThenControlPlane<'_>
could be a mutable reference during fuzzing and a ZST when not fuzzing.
remlse commented on issue #6039:
I think I'm misunderstanding something, the function signature you're describing looks like this, right? (ignoring the lifetimes)
impl ControlPlane { fn as_mut(&mut self) -> Self { // ? } }
I don't know how this function can be written without reference counting?
bjorn3 commented on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) } }
And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
bjorn3 edited a comment on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { /// Reborrow `ControlPlane`. fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) } }
And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
bjorn3 edited a comment on issue #6039:
You did do something like
struct ControlPlane<'a>(&'a mut ControlPlaneInner); impl ControlPlane<'_> { /// Reborrow `ControlPlane`. fn as_mut(&mut self) -> ControlPlane<'_> { ControlPlane(/*this does an implicit reborrow*/self.0) // Equivalent to: // ControlPlane(&mut *self.0) } }
And then use it like
fn example(control_plane: ControlPlane<'_>) { foo(control_plane.as_mut()); bar(control_plane.as_mut()); baz(control_plane); }
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5
cfallin commented on issue #6039:
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?
I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>
everywhere and then keep anUnstructured
inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific
Unstructured
everywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.I think I want the design to follow these principles:
- Compilation output of some CLIF is a function of that CLIF, compiler settings, and a
ControlPlane
composed of Rust primitives (bool
s, later a fuel counter, etc);- The fuzzing infrastructure can create these
ControlPlane
s with anArbitrary
impl.This has a few nice properties:
- It avoids lifetime proliferation.
- It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level
Unstructured
for aVec<bool>
per function than it is to somehow try to split the random input into chunks.- It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.- It means that we don't have to propagate "not enough data" errors upward from
Unstructured
.- It means that we could later drive the mechanism with some other source of decisions; we're not tied to
arbitrary
.Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain data. That leads to less friction with Rust's ownership model as well as more determinism and control.
IMHO the "reimplement
Arbitrary
" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop abool
off aVec
" approach, and likewise for other data types we need. It's true that the fuzzer may give us onlyN
bools in ourVec<bool>
, and a particular compilation path asks forM > N
bools (the lastM - N
of which are just defaulted tofalse
); but libFuzzer is generally good about learning the structure of data that leads to better coverage.Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter
&mut ControlPlane
; as long as its methods in a chaos-disabled build return constants, and theControlPlane
itself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one liveusize
-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.
cfallin edited a comment on issue #6039:
I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?
I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse :thinking:
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a
ctrl_plane: &mut ControlPlane<'_>
everywhere and then keep anUnstructured
inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific
Unstructured
everywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.I think I want the design to follow these principles:
- Compilation output of some CLIF is a function of that CLIF, compiler settings, and a
ControlPlane
composed of Rust primitives (bool
s, later a fuel counter, etc);- The fuzzing infrastructure can create these
ControlPlane
s with anArbitrary
impl.This has a few nice properties:
- It avoids lifetime proliferation.
- It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level
Unstructured
for aVec<bool>
per function than it is to somehow try to split the random input into chunks.- It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.- It means that we don't have to propagate "not enough data" errors upward from
Unstructured
.- It means that we could later drive the mechanism with some other source of decisions; we're not tied to
arbitrary
.Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain (structured) data. That leads to less friction with Rust's ownership model as well as more determinism and control.
IMHO the "reimplement
Arbitrary
" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop abool
off aVec
" approach, and likewise for other data types we need. It's true that the fuzzer may give us onlyN
bools in ourVec<bool>
, and a particular compilation path asks forM > N
bools (the lastM - N
of which are just defaulted tofalse
); but libFuzzer is generally good about learning the structure of data that leads to better coverage.Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter
&mut ControlPlane
; as long as its methods in a chaos-disabled build return constants, and theControlPlane
itself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one liveusize
-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.
remlse commented on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
:approaches:
- [ ] Replace with
Rc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
:approaches:
- [ ] Replace with
Rc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
approaches:
- [ ] Replace with
Rc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also a bad idea.approaches:
- [x] https://github.com/remlse/wasmtime/pull/2
- [ ] Don't make use of unsafe and implement our own, similar functionality according to our needs.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
approaches:
- [ ] Replace with
Rc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also a bad idea.approaches:
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
approaches:
- [ ] Replace with
Rc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [ ] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
cfallin commented on issue #6039:
I would prefer if we went with a
&mut ControlPlane
as well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>
. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCell
will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Send
which is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of design choice, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mut
around" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>
pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mut
context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mut
references in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane
, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
cfallin edited a comment on issue #6039:
I would prefer if we went with a
&mut ControlPlane
as well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>
. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCell
will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Send
which is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mut
around" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>
pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondetermism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mut
context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mut
references in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane
, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
iximeow commented on issue #6039:
just in the interest of being explicit: i think part of the feedback here is, _if there is an overhead for passing
&mut ControlPlane
around, it is overhead Cranelift would need to deal with at some point in the future anyway, for other reasons [like user-directed opt hints or optimization fuel], so that definitely won't be held against your MVP here_. doing the simple thing with&mut ControlPlane
is Probably Fine, and the relative benefit of havingChaos mode
in the compiler almost certainly outweighs a small compile time regression if one is measurable.(also, i think the trick about holding a
&'a mut ControlPlaneInner
actually ensures thatControlPlane
is _always_ ausize
- a ref of a zero-size type is still a reference and still ausize
wide. this example might help show which changes are actually resulting in optimizations: rustc deletes theControlPlane
argument when it's not needed, even if it is non-zero-size, but it turns out that iffoo
/bar
/baz
are left non-side-effectful they aren't useful to demonstrate the calling convention changes anyway!this happens to make me think that passing around a
&mut ControlPlane
promises to have no cost - rather than "not enough to care" cost - in thechaos mode is disabled
case after all :D)
cfallin edited a comment on issue #6039:
I would prefer if we went with a
&mut ControlPlane
as well, rather than subverting the ownership system with aRc<RefCell<ControlPlane>>
. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?In addition to the disadvantages you named, using a
RefCell
will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures!Send
which is problematic for our internal compilation parallelism.But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
&mut
around" and I don't really see the downsides. In particular, this statementIf single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The
Rc<RefCell<T>>
pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondeterminism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a
&mut
context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with&mut
references in general? Or something else?Anyway, given all that, I would really strongly prefer the suggestions I gave above: a
&mut ControlPlane
, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also a bad idea.approaches:
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
approaches:
Replace withRc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [x] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
remlse commented on issue #6039:
The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target
cranelift-fuzzgen
compiles and works to validate the architecture.
remlse edited a comment on issue #6039:
The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target
cranelift-fuzzgen
compiles (with feature chaos) and works to validate the architecture.
remlse commented on issue #6039:
(Sorry about the rebase, habits are stubborn. I'll solve the conflicts with a merge commit next time.)
remlse edited a comment on issue #6039:
The rename seems uncontroversial, so I did that right on this branch.
- crate
cranelift-chaos
renamed ->cranelift-control
ChaosEngine
renamed ->ControlPlane
- [ ] should the feature also be renamed from
chaos
tocontrol
? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.
Usage of
unsafe
vs. lifetime proliferation to make use ofUnstructured
I agree with the feedback that
unsafe
must be avoided and that spreading&'a mut ControlPlane
everywhere is also a bad idea.approaches:
Edit: Currently, the internal representation is a
Vec<bool>
without usingUnstructured
. When it is useful to do so, we can use it in the future, internally, and without leaking any lifetimes. As an implementation detail, it's easy to change.
Non-deterministic order of perturbations because of multi-threading combined with
Arc<Mutex<_>>
approaches:
Replace withRc<RefCell<_>>
, the compiler will then prevent control planes from being shared across threads.- [x] Propagate
&mut ControlPlane
through the call stack instead of ownedControlPlane
s. (use bjorn3's pattern to make the reference zero-sized)Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with
Rc<RefCell<_>>
? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:
- It is error prone because users of
&Foo
may erroneously assume the state ofFoo
not to change.- It is less performant, because ownership rules are checked at runtime.
In our case, I would say these two do not apply: users of
ControlPlane
shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this
ControlPlane
, which is a piece of data that can implHash
andEq
like anything else.This may be an issue though, as
Hash
andEq
cannot be derived onRc
andRefCell
. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.
cfallin commented on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
main
a little more than the two-week minimum with the beta-to-release delay.
cfallin edited a comment on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
main
a little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions,e tc).
cfallin edited a comment on issue #6039:
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on
main
a little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions, etc).
cfallin commented on issue #6039:
@remlse it looks like we'll need to add the new crate to a list in
scripts/publish.rs
; this was only caught in the full CI run in the merge queue. (If you want to test in your commit to fix this, you can addprtest:full
to the commit message.)
remlse commented on issue #6039:
@cfallin the full tests are passing now, maybe we can try to merge again?
Last updated: Jan 24 2025 at 00:11 UTC