Hello!
Some time ago I discovered Cranelift, it's a pretty cool project! Yesterday I read some of the code (fixed a good-first-issue at first, maybe I can push it in a few days) and seems you lack some kind of after-transformation verification/equivalence checker (like LLVM does) of your rules and also have seen it's one of your roadmap entries. Thought about it for a bit and came up with a simple solution (that probably was already mentioned..), symbolic representations (which are used everywhere) with a solver with your rules (for the equivalence checker) or just running tests on those symbolic inputs.
What do you think? This doesn't sound too bad to implement (... probably?). Was this already mentioned/worked on? Is this too simple and doesn't answer some needs?
New to the community, hello!
This is being worked on: https://github.com/avanhatt/wasmtime
This uses the Z3 SMT solver I believe.
Interesting. Doesn't seem like I can contribute to that -- seems like it is very disorganized (with a lot of weird Z3 notation I guess) unless you have some document overviewing it? It seems very close to what I imagined, anyway
That is very much work-in-progress, but has already discovered previously-unknown bugs. So yes, we agree with you: checking equivalence of ISLE rules is a good idea!
I'd certainly be interested to see your approach to the idea if this is something you want to work on. The level of complexity in that team's implementation (and I think they'd agree that it is quite complex) is, I think, necessary to handle the full suite of ISLE rules. But you might find there's a subset of rules where you can do something simpler or faster, and that could be valuable. You could also explore something like property-based testing or fuzzing, which might be a nice complement to the complete search that an SMT solver like Z3 does. Or you might just have fun experimenting, which I think is worth-while too. In any case I'd encourage you to explore your idea if you want to!
What is the scope of their implementation and why is it complex (the former might answer the latter)?
@ghostway it's complex because it's quite a hard problem! I think you may be missing some context here -- this is an ongoing research collaboration and most of the context is in weekly meetings that we have with @Alexa VanHattum and others. I don't think the repo is intended for public consumption yet necessarily, which is why it doesn't have polished documentation. The intent of the collaboration is to eventually publish something, after which time it'll be upstreamed and much better documented
I see I'm missing a lot of context then! Depending on whether this is something the future publishers would like to share, I'd be happy to come to one of the meetings.
I'm not well versed on these things, just had an idea when I've seen it on your roadmap, so I probably don't have a full understanding of the problem.
Thanks for the warm welcome, anyway!
it's probably up to @Alexa VanHattum how we share the current effort; I suspect when the work is merged upstream and fully publicly documented that might be a good time to jump in though
Sure. Is there a rough date for when that's going to happen? In the meantime I'll see what I can make up of this problem
Hi @ghostway ! As other have mentioned, our current implementation is very much a research prototype at this stage; we’ll add comprehensive documentation before merging upstream.
As for scope/complexity: our current focus is ISLE rules on integer types (i8, i16, i32, and i64) for the ARM aarch64 backend (though we’ve also been able to apply our prototype to rules in the midend optimizer and x86 backends). The implementation is complex because ISLE is much more expressive than most compilers’ term rewriting DSLs: lowering rules in particular require combining the semantics of ISLE matches themselves, clif (Cranelift IR) terms, ARM aarch64 instructions, and many external Rust functions from external constructors and extractors. Many of these terms do not have semantics that map directly onto a symbolic representation like SMT, so we have custom encodings. In addition, ISLE rules are polymorphic over clif types, so we need custom logic for type inference and type enumeration (SMT requires static bitvector widths).
In terms of contributing: our meetings are currently focused on the final push toward publication, so while you’re welcome to attend now (send me a DM and I can forward info), the meeting topics will likely be fairly inaccessible without context. By late April/early May we should be in a much better place to onboard new contributors. I also typically give weekly updates at the general Cranelift meetings, which have information on attending here. And as Chris and Jamey have said, there are a lot of cool complimentary approaches someone could take to checking transformation equivalence, that I would also be excited to see happen and would be happy to give input on if that's helpful!
You can find some information on bugs analyzed by our tool thus far here:
cls
bug.iconst
values.Regarding complementary approaches, I got thinking yesterday about one that should be fairly easy to try: https://github.com/bytecodealliance/wasmtime/issues/5967
@Alexa VanHattum Hello!
Yes, I've seen some rust stuff in the ISLE which was interesting. Why is it conserved to particular backends? Is clif itself too expressive, or is it a limitation on the optimizations you use?
I'll see what I can do about both meetings. Thanks for the details!
Yeah, fuzzing the interpreter does sound good as well (but then, why are you testing on particular backends?)
Aha, I see. ISLE is also for emitting, not only transforming the graph. That's actually very interesting, like LLVM's tablegen but for transforming as well (kind of, not exactly)...
About your idea @Jamey Sharp, it wouldn't be enough to add an optimize
function to TestFileCompiler
and just do self.ctx.optimize(isa)
, right? So how would that be done?
This recent blog post might help answer some of your questions about why ISLE works the way it does, and what it does and doesn't do: https://cfallin.org/blog/2023/01/20/cranelift-isle/
And to clarify, it isn't that I want to fuzz the interpreter; we're using the interpreter as a reference implementation of CLIF semantics, so to some extent it's "trusted". Instead I want to check that this reference implementation says that a CLIF program after mid-end optimization still means the same thing that it did before optimization.
@ghostway, I've updated issue 5967 with more detail about how I think that would work. Feel free to ask more questions either here or in issue comments! I'd prefer to discuss that idea in issue comments so there's a record of any questions and answers there.
Jamey Sharp said:
And to clarify, it isn't that I want to fuzz the interpreter; we're using the interpreter as a reference implementation of CLIF semantics, so to some extent it's "trusted". Instead I want to check that this reference implementation says that a CLIF program after mid-end optimization still means the same thing that it did before optimization.
Yes, I understood. Thanks for the blog post!
Thanks for the clarifications on the issue. I'll try to work on it tomorrow maybe.
About the clarifications, yes it's what I did. Just need to add that boolean (if I understood exactly what you wanted. We will see in the eventual pr) :)
@Jamey Sharp I finally got to this, and as a sanity check I switched it to always use the optimized one (with the interpreter). I added a to_optimized method to the TestCase
(creates a new instance of TestCase) and I am getting Trap(User(HeapMisaligned))
just after starting. Do you have an idea why would that happen?
Hey @ghostway , the fuzzgen library sometimes creates CLIF functions that purposefully trap. (this should be less than 1% of the functions that it generates).
What we do in the fuzzer is that if it traps in the interpreter we don't actually run it on the host. In this case It is probably a good idea to ignore the traps.
Or, if we are comparing the interpreter vs the interpreter we can record the original test case results (including the exact trap) and compare it against the optimized version, that way we can ensure that we don't change traps via optimizations.
@Afonso Bordado Hello,
I see, but you already continue
if the result of the first run is a trap; so it wouldn't even get to the second run
Also, is compiling and optimizing for the interpreter too expensive (even for small compiles like testcases)? I don't see a pretty way to inter-changeably optimize or compile code without inflating the code considerably
@ghostway I think you replied in the wrong topic. If you click the button to edit your message you can edit the topic name of your message to the right topic.
Well, the first step is always running the interpreter, so after that we can branch and do two completely different things.
It is a bit more code, but it should be fairly clean to split after the first interpreter run.
Also, is compiling and optimizing for the interpreter too expensive (even for small compiles like testcases)?
IIRC running the compiler is the most expensive step, I think I measured it once and it was like 50% of the fuzzer time (This was a long time ago, so I might be misremembering some stuff). For optimizations I haven't measured it recently and we had a bunch of changes in the optimizations that we do.
We shouldn't need to run the compile step for the interpreter, just the optimizations.
Yes, but having two or more (because of lifetimes), possibly uninitialized, variables is not what I'd like to see in rust code. I didn't want to use closures, but they might just be the prettiest way you can do this (I don't want to duplicate that for loop). It still crashes for me, anyway. Let's talk over code and not hypotheticals, maybe this afternoon/evening I'll commit this (morning for me right now)
I've created the PR @Jamey Sharp :)
Last updated: Dec 23 2024 at 12:05 UTC