cranelift / Issue #1234 Improving Cranelift's IR generation · git-cranelift

Cranelift's IR went through a huge overall from the original python generator to one written in Rust. This has made it easier to work on it as we're using the same language to generate the code that used in the rest of the project. However it is still not ideal in terms of contributing new instructions or changing existing instructions as it is not clear what steps are required, and which parts of the meta code generator need to be modified to correctly add an instruction.

This could be partially solved by providing better documentation, however documentation doesn't improve the experience of contributing to the codebase itself, and documentation is likely to fall out of date as time goes on similar to the current situation, as there's no strict link between an instruction and its documentation besides what's available in InstBuilder.

Instead I would like to propose rewriting at least some of the meta code generator with a greater focus on data locality, so that if someone wants to contribute to the IR, the number of overall steps is reduced. This should also help with documenting the IR format, as the documentation can be more closely coupled.

Proposed Solution

I would like to propose something along the lines of using a data format to encode information about a instruction and how it is encoded. I've used YAML to illustrate what this could look like, though I'm not advocating for the use of YAML specifically.

---
name: jump
doc: >
    Jump.

    Unconditionally jump to an extended basic block, passing the specified
    EBB arguments. The number and types of arguments must match the
    destination EBB.
attributes:
    - terminator
    - branch
operands_in:
    - Ebb # This would also imply `args`
encodings:
    # Encode from recipe
    x86:
        recipe: x86_JMP
    # If this was YAML specifically you could use anchors for recipes. e.g.
    x86:
        << *X86_JMP
    # Or encode directly
    riscv:
        emit: >
            let dest = i64::from(func.offsets[destination]);
            let disp = dest - i64::from(sink.offset());
            put_uj(bits, disp, 0, sink);

The main advantage to this approach is that moves the current imperative style of cranelift-codegen into something that is more declarative and data oriented. To me this provides more clarity around how a instruction is defined and used, if you wanted to create a new instruction one could simply copy and paste from another already working instruction.

Drawbacks

While the goal of this proposal is to simplify working on the IR language itself it could make the underlying meta code more complex, as well potentially increasing the compile time to build cranelift-codegen since there would now be a deserialisation step that wasn't there before.

Alternatives

Instead of changing the system there could be a greater focus on building resources for contributing to cranelift that try to explain how to use the current system.

Stream: git-cranelift

Topic: cranelift / Issue #1234 Improving Cranelift's IR generation

GitHub (Feb 28 2020 at 23:27):

Motivation

Proposed Solution

Drawbacks

Alternatives