So ever since I finished writing a backend I've been slowly working on documenting the process.
But I've come to the realization that I could create much better documentation if I based it on an actually existing backend that is in the cranelift source tree. Concrete examples make learning something new like this much easier (at least in my experience), and having a fully implemented backend available for readers to study or even "copy and modify" will certainly lower the barrier of entry.
The problem with all the existing official backends is that they are very complicated, because the architectures they target are complicated. And while I could use my own backend, it's targeting a made-up toy ISA that has no relevance.
So I'm considering writing yet another backend specifically for this purpose. It should target an architecture that is as simple as possible (both to understand and to implement in Cranelift) as well as has at least some relevance in the real world. And since 32 bit is not _fully_ supported in Cranelift (I did get it to work but the amount of additional stuff that needs to be done would needlessly blow up the guide) it needs to be a 64 bit architecture.
By making this backend part of the official repo rather than just a fork it would also be guaranteed that it is always in a working state, even if some breaking change invalidates parts of the guide (until it can be updated, I know documentation in open source projects like this tends to lag behind). But I assume there are additional standards that need to be met for a backend to make it into the official repo. So before I start to do anything I'd like to gather some input first, particularily about what architecture to target.
Documentation for writing backends would be great! It is one of the points on the roadmap for 2023: https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-roadmap-2023.md#cranelift-porting-how-to-write-a-backend I don't know if anyone is already planning to do this in the short term.
Why 32 bits is not fully supported in cranelift?
As i understand it this is because ValueRegs only has space for 2 regs right now, while representing 128bit ints on 32bit systems needs 4 regs. Changing this is easy I believe, but it comes at a perf cost for 64bit archs.
Indeed it is because of ValueRegs. In my fork I did get 32 bits to work correctly, but I specifically asked if I should PR this and the answer was no.
The change itself is easy enough and can even be feature gated as to not incur a performance penalty as long as only 64 bit backends are enabled. But the reason was that supposedly other ways are being explored to support this differently. This is the answer I got:
https://github.com/bytecodealliance/wasmtime/issues/5572#issuecomment-1414006204
You can also read about it in that thread, but essentially you can write 32bit backends without any modification to cranelift.
However you won't be able to support 128 bit integers, and without those it is impossible to compile the Rust core library, so essentially the compiler is useless.
so essentially the compiler is useless.
It still works as backend for wasmtime, which doesn't need i128.
I specifically asked if I should PR this and the answer was no.
You can also read about it in that thread, but essentially you can write 32bit backends without any modification to cranelift.
However you won't be able to support 128 bit integers, and without those it is impossible to compile the Rust core library, so essentially the compiler is useless.
This misrepresents things slightly, I think: the answer was "I think it's probably best to wait until we add a new target to bring this back" meaning yes, we can absolutely make the change (which is a fairly small tweak) as soon as it's needed, but not before because it does carry a perf penalty. Calling the compiler "useless" on 32-bit targets is inaccurate: we are happy to take a 32-bit backend, including with 128-bit support, as soon as someone is willing to do the work.
Well, I am willing to write any backend for the purpose of this guide as long as it is simple enough.
The whole point of why I started this thread is to find a target architecture that is most suited for writing an example backend around.
Hmm, reading the intro message again... I find myself actually wondering what would result in a simpler backend, and I'm actually not sure that ISA complexity is the top-order factor. The reason I say that is that RISC-V 64 is nominally quite simple, but we still twist things a bit to generate better code, or just to fit an abstraction onto the ISA
in other words, a too-simple ISA may result in the necessity for hacks -- witness riscv64's pervasive use of pseudoinstructions with internal control flow, for example, because it doesn't have cmoves
ARM is a nice middle ground; maybe ARM32 would be a good target? The other advantage of this is that if it's not a tier 1, widely-used architecture (as much, anymore), we can afford to not optimize it to the gills, and leave it a bit simpler
the other other advantage of that approach is that if you dig deeply enough in our git history, you'll find that we had half of an ARM32 backend once; it should still be somewhat useful at least for the instruction-encoding pieces
Good point. My first choice would have been RV32I but that has exactly the same problems obviously.
ARM would certainly be the most relevant architecture to implement.
For the sake of keeping things as easy to understand as possible I wouldn't be performing much of any optimization at all though, and ARM may still be relevant enough to warrant optimization.
well, the good thing about the ISLE lowering rules is that we can separate out any "advanced optimization" rules to a separate file, such that the backend still works without them
but if this is our "teaching backend" I'm also comfortable with a slightly different approach to it
honestly we're so resource-strapped that I'd be surprised if anyone in the project explicitly carves out time to optimize ARM32, anyway, once it is minimally working
ARM it is then. I've had a look at all the different versions and I believe v6 is the one to use.
The backend is probably going to end up emitting only a subset of all possible instructions for simplicity sake.
Sounds good! Please do let me know if I can help with any questions
I'm not up-to-speed on ARM ISA level minutiae but the only input I would offer is that we should probably assume we have FP; I know some earlier variants did not
Yes v6 has FP. It even has SIMD but I won't be implementing that.
Implementing architectures without FP is quite complicated atm anyway.
I had to create a large number of new libcalls when I did it with my own ISA.
Eventually Cranelift should probably support softfloat out of the box but I assume that is very far at the bottom of the priority list.
Honestly I would recommend RV32I over ARM. It is a much simpler ISA. For conditional moves, many RISC-V cores will fuse a branch that skips over an arithmetic instruction into a conditional move, so you could treat that as a single unit in the code generator.
That's true in the abstract, but the context here is how well it fits as a Cranelift backend specifically
those internal branches in a single pseudoinstruction create extra complexity
empirically, the aarch64 backend feels cleaner and simpler than RV64, even though the latter is a simpler ISA from first principles
I don't think ARM is much more complicated than RiscV if I omit Thumb.
I do know RiscV very well and ARM not at all though so there is that.
Regarding standards for merging new backends: in short, we need confidence that the new backend won't interfere with our ability to maintain the rest of the project. But that isn't intended to be a difficult milestone to reach. For the long version, see https://docs.wasmtime.dev/stability-tiers.html
Also I would actually need to do RV32IMF when thinking about it. Multiplication and hardware floats are necessary.
Last updated: Dec 23 2024 at 12:05 UTC