Hello, I have being writing a blog post about compiling brainfuck with cranelift (here is a draft), and some questions arose as I was writing it. Most of them are issues with the documentation, I think.
This appears to be a Cranelift unique extension the concept of Basic Blocks. The extensions appear to be that blocks have parameters (instead of phi functions, as I have seen on LLVM IR, as I read bits of the kaleidoscope tutorial), and instead of using branches with two targets, EBB allows branches to falltrought to the next instruction.
Also, from what I tested, only a single branch is allowed before a terminator instruction (could be wrong), but documentation (at docs/ir.md at least) don't make this clear.
FunctionBuilder
and FunctionBuilderContext
split?I presume that is because FunctionBuilder
can only be used for a single function, as it is said on its struct docs. But FunctionBuilder::finalize
says that it is reset for the next function compilation.
Also, func
in FunctionBuilder
is public. What happens if I replace the function mid-building? This can be used for switching to the next function after calling finalize
?
The docs for this impl FunctionBuilder
(almost missed it because it in a impl
) says that the first switch_to_block
defines the entry block, but there is also a separated function append_block_params_for_function_params
to add the functions parameters to a block. There are cases where the entry block don't receive the functions parameters? Or cases when I have multiple blocks with the function parameters?
There is also a append_block_params_for_functions_returns
. In what cases this is used?
InstBuilder
there are a bunch of instructions in PascalCase there?They appear to receive a OpCode
as parameter, but not all opcode appear to make sense to each instruction. Looking around, the methods in snake_case appear to be nice wrapping around them, but why the PascalCase ones are exposed under the same struct?
FunctionBuilder::seal_all_blocks
says that is more efficient to call seal_blocks
as soon as possible. But how soon this need to be to make a difference? Before switching to another block, before creating another block, before using a Variable, or before emitting the next instructions?
And why this makes a difference?
Also, if you only read the docs for FunctionBuilder::seal_blocks
, it suggests that the block must be sealed after the last branch instructions to the block, or there will be inconsistencies.
I am using CompiledCodeBase::code_buffer
, it is a public method, but it is inside a non-public module, so it is not displayed on docs, I don't know if I should be using it. Its type alias, CompiledCode
, is public visible, so maybe it is only an issue in how docs are not displaying the methods?
Hi @Rodrigo Batista de Moraes -- many questions here; I'll try to answer a few:
What exactly is an Extended Basic Block
Your answer is not quite right, and there is also some updated information here (old docs). First off, we don't actually use EBBs any more; what Cranelift is doing now is properly called just a "basic block". A classical EBB is a single-entrance, multiple-exit block, or in other words, allows branches in the middle of the block out to someplace else. We used to use this abstraction, but we don't any more. The fact that blocks end with up to two branches (a conditional branch and an uncond one) is an artifact of this historical approach and we're working on removing it (by having a two-target conditional branch instead).
Block parameters are an orthogonal design decision, and in fact there are other compilers (e.g. MLIR in LLVM) that use them in basic blocks too
Why exactly we must seal the blocks as early as possible?
This has to do with the algorithm that we use to build SSA (i.e., add block parameters for locals that are defined more than once). If a block is not yet sealed, then more predecessors may be added in the future, and so we can't do anything to resolve all of a local's definitions except insert a placeholder blockparam. We may remove it later if it turns out all preds had the same definition for the local, but this takes work. Instead if we know for sure that no more preds will be added, we can do this optimization (only a single def exists, use it directly) eagerly.
Why are
FunctionBuilder
andFunctionBuilderContext
split?
This is a great question. Probably either FunctionBuilder::finalize
should consume self
, so the caller has to construct a new FunctionBuilder
to build another function, or else the two structs should be merged. Either way allows reusing memory across functions.
It's important that FunctionBuilder::func
is public, because callers may need to access or manipulate it in the middle of using a FunctionBuilder
. But I suspect only bad things will happen if you overwrite that field mid-way through a function, so maybe it should be an accessor function instead of a public field.
If you'd like to try fixing these issues, I think they should be good ones to work on even if you don't know much about the rest of Cranelift.
The docs for this
impl FunctionBuilder
(almost missed it because it in aimpl
) says that the firstswitch_to_block
defines the entry block, but there is also a separated functionappend_block_params_for_function_params
to add the functions parameters to a block. There are cases where the entry block don't receive the functions parameters? Or cases when I have multiple blocks with the function parameters?There is also a
append_block_params_for_functions_returns
. In what cases this is used?
I think the best example for how these two functions are currently used is cranelift/wasm/src/func_translator.rs
. But the best answer to your question is that I don't think there's a good reason they're designed that way.
Why in
InstBuilder
there are a bunch of instructions in PascalCase there?
Those methods are for what we refer to as instruction "formats", rather than individual instructions. As you've noticed, each opcode is only used with one instruction format, but some formats are used for many different opcodes. In general I think you should probably ignore the instruction formats unless you're working on Cranelift itself. Maybe that means we should not expose them publicly, but I haven't thought that hard about that.
I am using
CompiledCodeBase::code_buffer
, it is a public method, but it is inside a non-public module, so it is not displayed on docs, I don't know if I should be using it. Its type alias,CompiledCode
, is public visible, so maybe it is only an issue in how docs are not displaying the methods?
Yes, that has bothered me too. I'm not sure what the best way to fix it is but those ought to be in the documentation.
Thanks for answering my questions!
@Jamey Sharp I will look into making a PR for fixing the FuncBuilder
API. Making finished
receive self
instead of &mut self
was easy enough to implement. I will also work on putting func
behind an accessor.
Rodrigo Batista de Moraes has marked this topic as resolved.
Hi! I’m trying to translate an invented assembly IR to cranelift IR to actually execute it (without writing a VM for my IR) and I’m having trouble with instructions that refers to registers (for example “add r1 r2 r3”) . Is there a way to refer to registers when generating the CLIR code?
You can use cranelift_frontend::FunctionBuilder and then use def_var and use_var.
Thanks @bjorn3 ! I'll take a look
Hi! What is the meaning of the arrow between 2 values in the CLIR?
For example:
block(v0: i64):
v4 -> v0
….
It's a value alias, introducing a new name for an existing value.
Ohhh ok thanks. Where I can find more information about things like syntax, what is a cranelift Variable, and how to store data in memory and read it later?
This document sounds like a good fit: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/docs/ir.md
Thanks Dan, that definitely helps :)
Hi :wave: is there a way to create global numeric constants? Right now I’m creating them using a FunctionBuilder via the builder.ins().iconst method so these constants are tied to that specific function right?
Cranelift doesn't have any global structures shared between multiple functions. Each function is completely independent from one another, so if you have a global constant you'd like to use in multiple functions, you'll need to copy its definition once per function that uses it.
Ohhh ok, thanks Fitzgen!
Is there any equivalent function to LLVMDumpModule in Cranelift?
You can do func.to_string()
to get the IR of a single function in text format. A cranelift_module::Module
implementation generally doesn't contain any cranelift IR, but already compiled code + metadata. As such you can't get a text format from a Module
implementation, but have to dump each function individually before defining it and record the metadata to map from FuncId or DataId back to a function/data object name.
Last updated: Dec 23 2024 at 12:05 UTC