remlse opened PR #6039 from fuzz-skip-branch-opt
to main
:
This is a draft of the MVP for chaos mode (#4134).
It extends the fuzz target
cranelift-icache
for now, by allowing it to run with the featurechaos
enabled. This will pseudo-randomly toggle branch optimization inMachBuffer
via the new chaos mode control plane in the cratecranelift-chaos
.Quick command for the documentation:
cargo doc -p cranelift-chaos --document-private-items --open
Running the fuzz target with chaos mode enabled:
cargo fuzz run --no-default-features --features chaos cranelift-icache
Passing a reference counted chaos engine around is not that bad, the diff is less noisy than I would've expected. I'm still planning to make an equivalent POC with private, global, mutable state in the
cranelift-chaos
crate to get a better idea of the trade-offs.Note that because of this zulip topic, I didn't bump the version of
arbitrary
in this PR to keep those issues isolated. Once that's resolved, we think it's probably a good idea to updatearbitrary
while we're working with it.I've added a couple print statements during development, and it seems the branch optimization is more often carried out than skipped. I guess this is consistent with libfuzzer's goal of generating data in a way that code coverage is maximized.
I also ran into a crash while running this fuzz target. The crash happens at
cranelift-icache.rs:220
:if expect_cache_hit { let after_mutation_result_from_cache = icache::try_finish_recompile(&func, &serialized) .expect("recompilation should always work for identity"); assert_eq!(*after_mutation_result, after_mutation_result_from_cache); // <-- this assert fails
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar? In any case, I'll investigate to see if the panic is caused by my changes or something different.
Questions
- [ ] I needed to use
Arc
andMutex
instead ofRc
andRefCell
in the control plane, because the compiler was complaining about theSend
trait not being implemented. So if Cranelift runs in parallel, won't that interfere with our plans with the fuel parameter? If fuel from the chaos engine is requested in a different order every time, we won't be able to deterministically reproduce bugs and pinpoint their origin.- [ ] Did I get the "paperwork" right?
- version 0.95.0 like other cranelift crates
- license
- Cargo.toml
- ... etc. ?- [ ] There are a several
ChaosEngine::todo()
s in the wild. Is it ok to merge these in principle or should we find a different solution for adding the chaos engine everywhere incrementally?Todos
- [ ] Add appropriate explanations to the commit messages
- [ ] Investigate crash (
Base64: Av////////8AAAIAAAAAAAD5jIyMjAAKAAAAAPHx8fERDgcAAAAAAJkBAAAAAAAAKwBp/5r//wAAAAAAAAAHbS45azEAAAAACF0=
)- [ ] Exend existing fuzzing documentation with an overview of chaos mode as well as how to run a target with chaos mode enabled (
cargo fuzz run --features chaos $TARGET
)
iximeow submitted PR review.
iximeow created PR review comment:
/// # Panics
(hi, i was interested in the Unstructured conversation and noticed this)
iximeow submitted PR review.
iximeow created PR review comment:
from a safety perspective, the other extremely important detail of this trick is that
ChaosEngineData
must also never move once references toengine.data
are taken. so the "this must not move"-ness ofdata
kind of percolates through to any enclosing type until it's somewhere that won't move (which works out here becauseChaosEngineData
ends up owned by anArc
where it oughtn't be moved out of.as an example that certainly won't come up here but would be Technically Possible,
Arc::new(some_chaos_engine.data.try_unwrap())
would yield aChaosEngineData
whoseunstructured
points to somewhere else, and would (hopefully! :D) fault on use.
iximeow edited PR review comment.
remlse submitted PR review.
remlse created PR review comment:
Thanks for pointing that out! Does it also apply if some type is heap allocated? I think the article I got this from used a
Box
to create a level of indirection. The idea being that if theBox
itself is moved, the values on the heap won't. So any existing references into that heap allocation would still be valid. In this case, theVec
is supposed to serves the same purpose as theBox
in the article.That being said, I just noticed that I got the order of the fields wrong, which the article warns against.
data
will be dropped beforeunstructured
, which creates a dangling pointer and UB.(?) oops :smile: I definitely prefer a safe solution as well.
remlse submitted PR review.
remlse created PR review comment:
Aaand reading a bit further, I also forgot the thing about
AliasableBox
so you're definitely right, movingdata
would also be UB.
remlse updated PR #6039 from fuzz-skip-branch-opt
to main
.
remlse updated PR #6039 from fuzz-skip-branch-opt
to main
.
remlse submitted PR review.
remlse created PR review comment:
Thanks! fixed it.
iximeow submitted PR review.
iximeow created PR review comment:
yeah that's the part that makes the blog post's solution a little more robust to
move
s - a&AliasableBox<ZipArchive<File>>
could be made to dangle, but with private internals you can ensure that wouldn't happen. anyway, hopefully threading a&mut ControlPlane
through the compiler as appropriate lets you avoid the whole construction, and double-hopefully the extra arguments don't affect compile time all that much :)
remlse updated PR #6039 from fuzz-skip-branch-opt
to main
.
remlse edited PR #6039 from fuzz-skip-branch-opt
to main
:
This is a draft of the MVP for chaos mode (#4134).
It extends the fuzz target
cranelift-icache
for now, by allowing it to run with the featurechaos
enabled. This will pseudo-randomly toggle branch optimization inMachBuffer
via the new chaos mode control plane in the cratecranelift-chaos
.Quick command for the documentation:
cargo doc -p cranelift-chaos --document-private-items --open
Running the fuzz target with chaos mode enabled:
cargo fuzz run --no-default-features --features chaos cranelift-icache
Passing a reference counted chaos engine around is not that bad, the diff is less noisy than I would've expected. I'm still planning to make an equivalent POC with private, global, mutable state in the
cranelift-chaos
crate to get a better idea of the trade-offs.Note that because of this zulip topic, I didn't bump the version of
arbitrary
in this PR to keep those issues isolated. Once that's resolved, we think it's probably a good idea to updatearbitrary
while we're working with it.I've added a couple print statements during development, and it seems the branch optimization is more often carried out than skipped. I guess this is consistent with libfuzzer's goal of generating data in a way that code coverage is maximized.
I also ran into a crash while running this fuzz target. The crash happens at
cranelift-icache.rs:220
:if expect_cache_hit { let after_mutation_result_from_cache = icache::try_finish_recompile(&func, &serialized) .expect("recompilation should always work for identity"); assert_eq!(*after_mutation_result, after_mutation_result_from_cache); // <-- this assert fails
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar? In any case, I'll investigate to see if the panic is caused by my changes or something different.
Questions
- [x] I needed to use
Arc
andMutex
instead ofRc
andRefCell
in the control plane, because the compiler was complaining about theSend
trait not being implemented. So if Cranelift runs in parallel, won't that interfere with our plans with the fuel parameter? If fuel from the chaos engine is requested in a different order every time, we won't be able to deterministically reproduce bugs and pinpoint their origin.
-> answer:Arc
andMutex
must not be used.- [ ] Did I get the "paperwork" right?
- version 0.95.0 like other cranelift crates
- license
- Cargo.toml
- ... etc. ?- [ ] There are a several
ChaosEngine::todo()
s in the wild. Is it ok to merge these in principle or should we find a different solution for adding the chaos engine everywhere incrementally?Todos
- [ ] Add appropriate explanations to the commit messages
- [x] Investigate crash (
Base64: Av////////8AAAIAAAAAAAD5jIyMjAAKAAAAAPHx8fERDgcAAAAAAJkBAAAAAAAAKwBp/5r//wAAAAAAAAAHbS45azEAAAAACF0=
)
-> most likely due to usage ofArc
andMutex
- [ ] Extend existing fuzzing documentation with an overview of chaos mode as well as how to run a target with chaos mode enabled (
cargo fuzz run --features chaos $TARGET
)
remlse updated PR #6039 from fuzz-skip-branch-opt
to main
.
remlse updated PR #6039 from fuzz-skip-branch-opt
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
ctrl_plane
can probably go inside thestate
(EmitState
)?The issue with lifetimes that this would otherwise create (
&mut ControlPlane
inside of the struct) can be resolved I think bystd::mem::move
to take ownership of the control plane temporarily in places where we emit.
cfallin created PR review comment:
(remove debugging printlns before merging)
cfallin created PR review comment:
outdated comment?
cfallin created PR review comment:
I think it's probably better to pass in the control-plane state with each call to
compile
; theCompilerContext
is otherwise not that semantically meaningful (meant to enable reuse).
cfallin created PR review comment:
Let's remove this
is_noop
mechanism before merging.
cfallin created PR review comment:
I think this body makes sense as the
Default
impl (an emptyControlPlane
should have no affect on Cranelift's behavior as it is today -- this also implies how to use bools, i.e.false
should make no change).
cfallin created PR review comment:
I would return just
bool
(use .unwrap_or(false)
on thepop
).
remlse updated PR #6039.
remlse updated PR #6039.
remlse updated PR #6039.
remlse submitted PR review.
remlse created PR review comment:
Not sure if it's OK to export
MachInstEmitState
.I needed it here.
remlse submitted PR review.
remlse created PR review comment:
I didn't combine these in a
Vec<(Function, ControlPlane)>
, because it makes the diff a little cleaner and in the manual arbitrary implementation it can still be controlled that the two vectors have the same size.
remlse submitted PR review.
remlse created PR review comment:
If this is forgotten, the control plane would just silently be the default one for the rest of the compilation. I guess it should be fine, (for now) it seems like this is the only place where one has to remember to move the control plane back out of the emit state.
remlse updated PR #6039.
remlse requested cfallin for a review on PR #6039.
remlse edited PR #6039:
This is a draft of the MVP for chaos mode (#4134).
Edit: The implemented fuzz target changed to
cranelift-fuzzgen
.It extends the fuzz target
cranelift-icache
for now, by allowing it to run with the featurechaos
enabled. This will pseudo-randomly toggle branch optimization inMachBuffer
via the new chaos mode control plane in the cratecranelift-chaos
.Quick command for the documentation:
cargo doc -p cranelift-chaos --document-private-items --open
Running the fuzz target with chaos mode enabled:
cargo fuzz run --no-default-features --features chaos cranelift-icache
Passing a reference counted chaos engine around is not that bad, the diff is less noisy than I would've expected. I'm still planning to make an equivalent POC with private, global, mutable state in the
cranelift-chaos
crate to get a better idea of the trade-offs.Note that because of this zulip topic, I didn't bump the version of
arbitrary
in this PR to keep those issues isolated. Once that's resolved, we think it's probably a good idea to updatearbitrary
while we're working with it.I've added a couple print statements during development, and it seems the branch optimization is more often carried out than skipped. I guess this is consistent with libfuzzer's goal of generating data in a way that code coverage is maximized.
I also ran into a crash while running this fuzz target. The crash happens at
cranelift-icache.rs:220
:if expect_cache_hit { let after_mutation_result_from_cache = icache::try_finish_recompile(&func, &serialized) .expect("recompilation should always work for identity"); assert_eq!(*after_mutation_result, after_mutation_result_from_cache); // <-- this assert fails
Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar? In any case, I'll investigate to see if the panic is caused by my changes or something different.
Questions
- [x] I needed to use
Arc
andMutex
instead ofRc
andRefCell
in the control plane, because the compiler was complaining about theSend
trait not being implemented. So if Cranelift runs in parallel, won't that interfere with our plans with the fuel parameter? If fuel from the chaos engine is requested in a different order every time, we won't be able to deterministically reproduce bugs and pinpoint their origin.
-> answer:Arc
andMutex
must not be used.- [ ] Did I get the "paperwork" right?
- version 0.95.0 like other cranelift crates
- license
- Cargo.toml
- ... etc. ?- [x] There are a several
ChaosEngine::todo()
s in the wild. Is it ok to merge these in principle or should we find a different solution for adding the chaos engine everywhere incrementally? -> these have been removedTodos
- [ ] Add appropriate explanations to the commit messages
- [x] Investigate crash (
Base64: Av////////8AAAIAAAAAAAD5jIyMjAAKAAAAAPHx8fERDgcAAAAAAJkBAAAAAAAAKwBp/5r//wAAAAAAAAAHbS45azEAAAAACF0=
)
-> most likely due to usage ofArc
andMutex
- [ ] Extend existing fuzzing documentation with an overview of chaos mode as well as how to run a target with chaos mode enabled (
cargo fuzz run --features chaos $TARGET
)
remlse has marked PR #6039 as ready for review.
remlse requested elliottt for a review on PR #6039.
remlse requested wasmtime-fuzz-reviewers for a review on PR #6039.
remlse requested wasmtime-compiler-reviewers for a review on PR #6039.
remlse requested pchickey for a review on PR #6039.
remlse requested wasmtime-core-reviewers for a review on PR #6039.
remlse requested wasmtime-default-reviewers for a review on PR #6039.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
One small tweak to the conditional-compilation strategy: I had been thinking that we could have the methods that produce decisions, like
get_decision
here, return a default value (false
here) as a constant in the non-chaos
-feature case; then the sites where we use these decisions, like inMachBuffer
, don't require annotation with conditional compilation. What do you think?
cfallin created PR review comment:
A few things:
- Usually a mut-accessor will be named like
fn ctrl_plane_mut(&mut self) -> ...
- Let's have a different one too,
fn take_ctrl_plane(self)
, that consumes the emit-state and gives us back the control-plane state
cfallin created PR review comment:
Yeah, we may be able to find a better way here eventually, but I think this strikes a good balance for now.
remlse submitted PR review.
remlse created PR review comment:
I'm thinking about the potential performance impact in release builds, but I guess it's safe to assume the compiler inlines a constant
false
and removes the resultingif false {}
.In my view, it would be a nice aspect of the control plane that there is no way to (mis-)use it in regular builds. But I guess that every control plane API needs to have some default output value anyway... and that can probably always be inlined as well? And we can annotate these default-returning functions with
#[inline]
.What is the downside of conditional compilation at the call sites? It seemed like an easy way to be really, really sure nothing bad happens in release builds, but on second thought, it doesn't seem necessary.
cfallin submitted PR review.
cfallin created PR review comment:
Indeed, we should be able to trust the branch-folding here.
What is the downside of conditional compilation at the call sites?
The main downside is that it spreads the implementation of a conditional decision across distributed points -- the alternative, where everything is wired to a single module where all conditional-compilation logic lies, makes it easier to make changes in the future. (Another example of this principle in action is the
memfd
pooling-allocator mechanism in Wasmtime: when I implemented this in #3697 last year I originally had feature-conditional code in many places, but Alex convinced me to centralize everything into two versions of one module and remove conditionals everywhere else. The result is far cleaner!)
remlse updated PR #6039.
remlse submitted PR review.
remlse created PR review comment:
fn take_ctrl_plane(self)
I think that actually caught a mistake. I was taking the control plane out of the emission state inside a loop, where later loop iterations would use the state with a now-empty control plane.
remlse updated PR #6039.
remlse submitted PR review.
remlse created PR review comment:
I'm assuming this goes for the
Arbitrary
implementation as well, so I removed the conditional compilation here too.The shim control plane's
Arbitrary
implementation now returns the default without consuming any bytes.
cfallin submitted PR review.
remlse updated PR #6039.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
I don't think
define_function
should get this argument. If you need this fine control you should probably usedefine_function_bytes
instead. This doesn't work with a module that serializes functions rather than immediately compiles them and it is confusing for most users of cranelift.
remlse submitted PR review.
remlse created PR review comment:
I think this is the place we actually needed that argument. So that would have to be rewritten with
define_function_bytes
. The module there is aJITModule
and itsdefine_function
anddefine_function_bytes
methods are not trivial, so it's not obvious to me how to do that.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
cranelift-object implements
define_function
asctx.compile_and_emit(self.isa(), &mut code, ctrl_plane)
followed bydefine_function_bytes
. You could do the same inTestFileCompiler
.
cfallin submitted PR review.
cfallin created PR review comment:
@bjorn3 in general the approach we've been taking is to thread through the control-plane everywhere compilation can be invoked; conceptually it's now another input along with the CLIF. (It does have a
Default
implementation.) If there's a way to rename this variant to a third option, and then have a variant that uses a default control plane, we can perhaps do that. Would you be willing to do that in a followup PR?
remlse updated PR #6039.
remlse updated PR #6039.
bjorn3 created PR review comment:
From a usability perspective having another method would work. But when serializing rather than compiling, a
ControlPlane
argument doesn't really make any sense as you can't serializeControlPlane
. (I have local changes to make a serializingModule
which I want to upstream. I'm using it to allow using cranelift-interpreter in cg_clif with minimal changes to cg_clif.)
bjorn3 submitted PR review.
remlse updated PR #6039.
cfallin submitted PR review.
cfallin created PR review comment:
I'm not sure I understand why serialization of modules implies the need to serialize a
ControlPlane
-- it is given as an argument, it isn't stored -- but please do create an issue or PR with a fix if you have one in mind. In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).
bjorn3 submitted PR review.
bjorn3 created PR review comment:
If the passed in
ControlPlane
should affect the eventual compilation of the function, it did need to be serialized. If not, there it doesn't really make much sense to pass inControlPlane
.In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).
:+1:
cfallin merged PR #6039.
Last updated: Jan 24 2025 at 00:11 UTC