saulecabrera opened issue #5178:
As part of https://github.com/bytecodealliance/rfcs/pull/28 I'm working on factoring out ISLE definitions from
cranelift-codegen
and moving them tocranelift-asm
; concretely, the enum representing each ISA's machine instruction.Currently each
inst.isle
, contains other helpers that use the enum definition, which semantically belong incranelift-codegen
and that ideally shouldn't be moved intocranelift-asm
. This poses an interesting question regarding having ISLE dependencies between crates and what's the best way to go about it.I had a chat with @cfallin, in which we discussed a couple of possibilities:
Declaring each
MInst
asextern
This would require (i) changing the
MInst
definition ininst.isle
incranelift-codegen
to beextern
(ii) moving eachMInst
definition tocranelift-asm
and (iii) making the Rust types from the generated code incranelift-asm
available to the environment of the isle-generated code incranelift-codegen
. Similar to what exists for all the the otherextern
enums. In terms of disadvantages/concerns: whenever there's a change to theMInst
enum incranelift-asm
the extern definition incranelift-codegen
needs to be updated too, which is kind of expected when seeing this from the angle of a dependency upgrade (i.e. sometimes changes are required in the consumer side), but in the end, we are keeping a verbatim copy of the definition and I wonder is this is going to introduce too much complexity when working with these definitions (there might be other disadvantages that I'm not clearly seeing too!).Having a single definition, serving both (or multiple) crates
This idea is half-baked. But in general it would require moving the common ISLE pieces to a central location and copying them at build time to perform code generation. This approach has the potential to solve the definition duplication issue, but the question of how to concretely express dependencies between ISLE definitions remains open: for example, the structure of this new central location could be something like:
- inst.isle - inst_enum.isle // MInst definition - lower.isle
Ideally in this case, we'd be able to express dependency between
inst.isle
andinst_enum.isle
in the language itself; but if there's a fundamental reason why we can't do it, this approach will require at the very least a "merge" of some sort between ISLE files, this makes this process less transparent. As a side note: is there a reason why ISLE couldn't be augmented to support a simple form ofimport
directives?
There are a bunch of pros/cons to both items above, and my objective with this issue is not to decide on which one is better, but instead, make this conversation public to see what others think about this and what could be a sensible path forward.
In terms of how this affects the development of Winch; I don't think it affects it much. I would've preferred to start with
cranelift-asm
, but I see value in reducing complexity up-front, and delaying any refactoring until there's a clear picture of how all the pieces interact together. If there isn't a clear path forward, I'd prefer to keep thewinch
/cranelift-codegen
dependency temporarily and avoid thecranelift-asm
refactoring until we have an agreement on what an acceptable solution is here.
fitzgen commented on issue #5178:
https://github.com/bytecodealliance/wasmtime/issues/4125 was the issue about generating machine instruction definitions that we talked about in the meeting today.
For the purposes specifically of unblocking a separate
cranelift-asm
crate, I think we could skip all the bits about collecting operands and basically everything other than what it takes to define theenum
itself. Everything else can be added later.I also think that yaml or toml would be nice for defining these. I specifically want to keep them declarative, and not programattically generated from "recipes" (like CLIF is in
cranelift-meta
) because it gets really hard to find the one place where an instruction is defined and where all instructions that exist are.Straw person:
instructions: nop: len: u8 alu_rmi_r: size: OperandSize op: AluRmiROpcode src1: Gpr src2: GprMemImm dst: WritableGpr # etc...
We compile this to ISLE with and without an
extern
inbuild.rs
depending which crate we are building.We can make the data format richer as we add features (collecting operands automatically, etc)
cfallin commented on issue #5178:
I second the table-driven approach above; we want to do it eventually anyway and it solves this problem, so it seems to be the cheapest and simplest way forward.
One constraint not mentioned above that's worth naming explicitly here is that, AFAIK, a crate cannot have a dependence on another crate's source tree. So directly using either a yaml/toml or an ISLE file from
cranelift-codegen
incranelift-asm
, or vice-versa, is not possible. I think probably the cleanest way forward is to put the description incranelift-asm
along with the generator, and give it a mode to produce an "extern variant" of the enum definition (this is kind of like a header file for a library). Then we can have ascripts/update-inst-defs.sh
or whatever that generates checked-into-git ISLE in bothcranelift-asm
andcranelift-codegen
. Thoughts on that?
saulecabrera commented on issue #5178:
One of the open questions that came out of exploring this approach is: what's the cleanest way to handle the dependencies needed for a successful generation of ISLE types (concretely in this case the
MachInst
enum). For example it's generally common forMachInst
definitions to depend on types defined through ISLE's prelude, likebool
,u8
; or to depend on other types that are guaranteed to be available in their environment likeGpr
.According to how the ISLE compiler works today, which requires an inline declaration of some sort of these external types, achieving a fully independent generation process, will require making the generation process aware of such dependencies. I can think of at least two ways to achieve this:
Requiring the configuration definitions to provide entries for all of their dependencies.
Deriving the dependencies at generation time.
Although, both avenues are feasible, they introduce questionable level of indirection, which is probably not desirable long term.
Option 1: Introduces more duplication by requiring the definition of the universe of extern types that a particular generation depends on.
Option 2: This is only partially possible, and mostly applicable when generating enums, which don't require the definition extern enum arms. Down the road this becomes problematic when synchronizing the extern generation of the
MachInst
enum incranelift-codegen
.After chatting with @cfallin, we concluded that the cleanest approach would be to have a crate-level solution for artifact dependencies; similar to what is proposed here https://github.com/rust-lang/cargo/issues/9096. Since this proposal is not ready yet, we concluded that the next cleanest approach might be to go with a feature driven assembler in
cranelift-codegen
, in which particular features control which compiler capabilities are exposed depending on the consumer: the optimizing pieces forcranelift-wasm
and the baseline pieces forwinch
.
saulecabrera edited a comment on issue #5178:
One of the interesting questions that came out of exploring this approach is: what's the cleanest way to handle the dependencies needed for a successful generation of ISLE types (concretely in this case the
MachInst
enum). For example it's generally common forMachInst
definitions to depend on types defined through ISLE's prelude, likebool
,u8
; or to depend on other types that are guaranteed to be available in their environment likeGpr
.According to how the ISLE compiler works today, which requires an inline declaration of some sort of these external types, achieving a fully independent generation process, will require making the generation process aware of such dependencies. I can think of at least two ways to achieve this:
Requiring the configuration definitions to provide entries for all of their dependencies.
Deriving the dependencies at generation time.
Although, both avenues are feasible, they introduce questionable level of indirection, which is probably not desirable long term.
Option 1: Introduces more duplication by requiring the definition of the universe of extern types that a particular generation depends on.
Option 2: This is only partially possible, and mostly applicable when generating enums, which don't require the definition extern enum arms. Down the road this becomes problematic when synchronizing the extern generation of the
MachInst
enum incranelift-codegen
.After chatting with @cfallin, we concluded that the cleanest approach would be to have a crate-level solution for artifact dependencies; similar to what is proposed here https://github.com/rust-lang/cargo/issues/9096. Since this proposal is not ready yet, we concluded that the next cleanest approach might be to go with a feature driven assembler in
cranelift-codegen
, in which particular features control which compiler capabilities are exposed depending on the consumer: the optimizing pieces forcranelift-wasm
and the baseline pieces forwinch
.
Last updated: Nov 22 2024 at 17:03 UTC