Stream: cranelift

Topic: ✔ Questions about Cranelift


view this post on Zulip Rodrigo Batista de Moraes (Nov 21 2022 at 13:25):

Hello, I have being writing a blog post about compiling brainfuck with cranelift (here is a draft), and some questions arose as I was writing it. Most of them are issues with the documentation, I think.

What exactly is an Extended Basic Block?

This appears to be a Cranelift unique extension the concept of Basic Blocks. The extensions appear to be that blocks have parameters (instead of phi functions, as I have seen on LLVM IR, as I read bits of the kaleidoscope tutorial), and instead of using branches with two targets, EBB allows branches to falltrought to the next instruction.

Also, from what I tested, only a single branch is allowed before a terminator instruction (could be wrong), but documentation (at docs/ir.md at least) don't make this clear.

Why are FunctionBuilder and FunctionBuilderContext split?

I presume that is because FunctionBuilder can only be used for a single function, as it is said on its struct docs. But FunctionBuilder::finalize says that it is reset for the next function compilation.

Also, func in FunctionBuilder is public. What happens if I replace the function mid-building? This can be used for switching to the next function after calling finalize?

The entry block of a function must inherit the parameters of the function?

The docs for this impl FunctionBuilder (almost missed it because it in a impl) says that the first switch_to_block defines the entry block, but there is also a separated function append_block_params_for_function_params to add the functions parameters to a block. There are cases where the entry block don't receive the functions parameters? Or cases when I have multiple blocks with the function parameters?

There is also a append_block_params_for_functions_returns. In what cases this is used?

Why in InstBuilder there are a bunch of instructions in PascalCase there?

They appear to receive a OpCode as parameter, but not all opcode appear to make sense to each instruction. Looking around, the methods in snake_case appear to be nice wrapping around them, but why the PascalCase ones are exposed under the same struct?

Why exactly we must seal the blocks as early as possible?

FunctionBuilder::seal_all_blocks says that is more efficient to call seal_blocks as soon as possible. But how soon this need to be to make a difference? Before switching to another block, before creating another block, before using a Variable, or before emitting the next instructions?

And why this makes a difference?

Also, if you only read the docs for FunctionBuilder::seal_blocks, it suggests that the block must be sealed after the last branch instructions to the block, or there will be inconsistencies.

How do I am supposed to get the bytes of the compiled code?

I am using CompiledCodeBase::code_buffer, it is a public method, but it is inside a non-public module, so it is not displayed on docs, I don't know if I should be using it. Its type alias, CompiledCode, is public visible, so maybe it is only an issue in how docs are not displaying the methods?

Contribute to Rodrigodd/Rodrigodd.github.io development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Nov 21 2022 at 18:10):

Hi @Rodrigo Batista de Moraes -- many questions here; I'll try to answer a few:

What exactly is an Extended Basic Block

Your answer is not quite right, and there is also some updated information here (old docs). First off, we don't actually use EBBs any more; what Cranelift is doing now is properly called just a "basic block". A classical EBB is a single-entrance, multiple-exit block, or in other words, allows branches in the middle of the block out to someplace else. We used to use this abstraction, but we don't any more. The fact that blocks end with up to two branches (a conditional branch and an uncond one) is an artifact of this historical approach and we're working on removing it (by having a two-target conditional branch instead).

Block parameters are an orthogonal design decision, and in fact there are other compilers (e.g. MLIR in LLVM) that use them in basic blocks too

view this post on Zulip Chris Fallin (Nov 21 2022 at 18:12):

Why exactly we must seal the blocks as early as possible?

This has to do with the algorithm that we use to build SSA (i.e., add block parameters for locals that are defined more than once). If a block is not yet sealed, then more predecessors may be added in the future, and so we can't do anything to resolve all of a local's definitions except insert a placeholder blockparam. We may remove it later if it turns out all preds had the same definition for the local, but this takes work. Instead if we know for sure that no more preds will be added, we can do this optimization (only a single def exists, use it directly) eagerly.

view this post on Zulip Jamey Sharp (Nov 21 2022 at 21:29):

Why are FunctionBuilder and FunctionBuilderContext split?

This is a great question. Probably either FunctionBuilder::finalize should consume self, so the caller has to construct a new FunctionBuilder to build another function, or else the two structs should be merged. Either way allows reusing memory across functions.

It's important that FunctionBuilder::func is public, because callers may need to access or manipulate it in the middle of using a FunctionBuilder. But I suspect only bad things will happen if you overwrite that field mid-way through a function, so maybe it should be an accessor function instead of a public field.

If you'd like to try fixing these issues, I think they should be good ones to work on even if you don't know much about the rest of Cranelift.

The docs for this impl FunctionBuilder (almost missed it because it in a impl) says that the first switch_to_block defines the entry block, but there is also a separated function append_block_params_for_function_params to add the functions parameters to a block. There are cases where the entry block don't receive the functions parameters? Or cases when I have multiple blocks with the function parameters?

There is also a append_block_params_for_functions_returns. In what cases this is used?

I think the best example for how these two functions are currently used is cranelift/wasm/src/func_translator.rs. But the best answer to your question is that I don't think there's a good reason they're designed that way.

Why in InstBuilder there are a bunch of instructions in PascalCase there?

Those methods are for what we refer to as instruction "formats", rather than individual instructions. As you've noticed, each opcode is only used with one instruction format, but some formats are used for many different opcodes. In general I think you should probably ignore the instruction formats unless you're working on Cranelift itself. Maybe that means we should not expose them publicly, but I haven't thought that hard about that.

I am using CompiledCodeBase::code_buffer, it is a public method, but it is inside a non-public module, so it is not displayed on docs, I don't know if I should be using it. Its type alias, CompiledCode, is public visible, so maybe it is only an issue in how docs are not displaying the methods?

Yes, that has bothered me too. I'm not sure what the best way to fix it is but those ought to be in the documentation.

view this post on Zulip Rodrigo Batista de Moraes (Nov 22 2022 at 14:19):

Thanks for answering my questions!

@Jamey Sharp I will look into making a PR for fixing the FuncBuilder API. Making finished receive self instead of &mut self was easy enough to implement. I will also work on putting func behind an accessor.

view this post on Zulip Notification Bot (Nov 22 2022 at 14:19):

Rodrigo Batista de Moraes has marked this topic as resolved.

view this post on Zulip Juan Bono (Jan 06 2023 at 10:00):

Hi! I’m trying to translate an invented assembly IR to cranelift IR to actually execute it (without writing a VM for my IR) and I’m having trouble with instructions that refers to registers (for example “add r1 r2 r3”) . Is there a way to refer to registers when generating the CLIR code?

view this post on Zulip bjorn3 (Jan 06 2023 at 10:08):

You can use cranelift_frontend::FunctionBuilder and then use def_var and use_var.

view this post on Zulip Juan Bono (Jan 06 2023 at 10:46):

Thanks @bjorn3 ! I'll take a look

view this post on Zulip Juan Bono (Jan 14 2023 at 21:16):

Hi! What is the meaning of the arrow between 2 values in the CLIR?

For example:

block(v0: i64):
v4 -> v0
….

view this post on Zulip Dan Gohman (Jan 14 2023 at 21:27):

It's a value alias, introducing a new name for an existing value.

view this post on Zulip Juan Bono (Jan 15 2023 at 19:00):

Ohhh ok thanks. Where I can find more information about things like syntax, what is a cranelift Variable, and how to store data in memory and read it later?

view this post on Zulip Dan Gohman (Jan 15 2023 at 19:02):

This document sounds like a good fit: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/docs/ir.md

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Juan Bono (Jan 15 2023 at 19:28):

Thanks Dan, that definitely helps :)

view this post on Zulip Juan Bono (Jan 19 2023 at 17:15):

Hi :wave: is there a way to create global numeric constants? Right now I’m creating them using a FunctionBuilder via the builder.ins().iconst method so these constants are tied to that specific function right?

view this post on Zulip fitzgen (he/him) (Jan 19 2023 at 17:47):

Cranelift doesn't have any global structures shared between multiple functions. Each function is completely independent from one another, so if you have a global constant you'd like to use in multiple functions, you'll need to copy its definition once per function that uses it.

view this post on Zulip Juan Bono (Jan 19 2023 at 18:18):

Ohhh ok, thanks Fitzgen!

view this post on Zulip harry fukyu (Feb 26 2023 at 01:57):

Is there any equivalent function to LLVMDumpModule in Cranelift?

view this post on Zulip bjorn3 (Feb 26 2023 at 09:20):

You can do func.to_string() to get the IR of a single function in text format. A cranelift_module::Module implementation generally doesn't contain any cranelift IR, but already compiled code + metadata. As such you can't get a text format from a Module implementation, but have to dump each function individually before defining it and record the metadata to map from FuncId or DataId back to a function/data object name.


Last updated: Nov 22 2024 at 16:03 UTC