✔ regalloc2 cfg question · cranelift

Stream: cranelift

Topic: ✔ regalloc2 cfg question

CompilerSmith (Apr 04 2024 at 19:59):

Hi, I have one more question/dilemma that i would like to present, its more of an issue on my end so dont worry if suggestions or thoughts arent perfect.

We are building a binary transformation framework and one of the things our framework does to ensure we dont break code when we move it is that for each basic block of a function, if there exists bytes above the basic block that we havent disassembled then we assume its code that could fallthrough into the basic block we are going to move.

To solve this we insert a thunk to jmp to the basic block. However this is a big problem because it creates multiple potential entrypoints into the function. This is going to cause issues in general with compiler theory since most algorithms are based upon the fact there exists only 1 entry into a function.

Regalloc2 states the following:

The CFG must have no critical edges. A critical edge is an edge from block A to block B such that A has more than one successor and B has more than one predecessor. For this definition, the entry block has an implicit predecessor, and any block that ends in a return has an implicit successor.

Will our framework break regalloc2/prevent register allocation? if so how do you suggest we handle such situations? Its an interesting problem but without these thunks its not safe to move code.

CompilerSmith (Apr 04 2024 at 20:01):

here is a visual example of the problem just for reference purposes:

image.png
image.png

As you can see there is a jump into the middle of another functions basic block which our framework is unable to safetly uncover. We will insert a thunk at 0x1826F2D20 since the bytes above it arent disassembled by our framework.

Sidebar: ida knows its call but ida can be wrong sometimes, we cannot be wrong when we disassemble things to be lifted and moved.

Notification Bot (Apr 04 2024 at 20:13):

CompilerSmith has marked this topic as unresolved.

Chris Fallin (Apr 04 2024 at 20:28):

regalloc2 definitely needs a single entry-point; you're correct that this is an important invariant, and many many compiler-algorithm details break without it.

however I think you can recover this property in a function with multiple entry points by having a "virtual entry block" that has N successors, each of the "real" entry blocks? You can even split those edges and put pseudoinsts in the edge blocks with various constraints if you need inputs to come in in different registers/locations

CompilerSmith (Apr 04 2024 at 20:38):

Chris Fallin said:

regalloc2 definitely needs a single entry-point; you're correct that this is an important invariant, and many many compiler-algorithm details break without it.

however I think you can recover this property in a function with multiple entry points by having a "virtual entry block" that has N successors, each of the "real" entry blocks? You can even split those edges and put pseudoinsts in the edge blocks with various constraints if you need inputs to come in in different registers/locations

Thank you for the response :)

Its rare that functions ever share basic blocks. The only time ive ever seen this happen is within the deep end of chrome.dll. We added code to check to see if the bytes above the basic block were padding (int3 or ud2) and that helped us reduce the number of thunks required. Right now we are able to rewrite over 95.6% of chrome.dll in about 40 seconds. We recreate all exception information as well.

However we are restricted by the frameworks thunk requirements. We might have to lose coverage to completely remove all functions which require multiple entrypoints...

I am in favor of the virtual entry block where we preserve the entire processors context and then dispatch to the basic block that its actually meaning to go into. We plan to do obfuscation and this might be the trick. Our obfuscated function will therefore have 1 entry point.

Amanieu (Apr 04 2024 at 21:34):

I've actually implement this virtual entry point mechanism in a compiler that uses regalloc2. Basically you have an entry block that only contains 2 instructions:

Entry which defines all of the incoming arguments (must be the same for all entry points).
EntrySwitch which is a terminator with multiple successor blocks, one for each actual entry point.

Then, after register allocation, you instantiate a copy of the entry block (including any moves emitted by the register allocator) as a prefix to each real entry point.

CompilerSmith (Apr 04 2024 at 22:12):

Amanieu said:

I've actually implement this virtual entry point mechanism in a compiler that uses regalloc2. Basically you have an entry block that only contains 2 instructions:

Entry which defines all of the incoming arguments (must be the same for all entry points).

EntrySwitch which is a terminator with multiple successor blocks, one for each actual entry point.

Then, after register allocation, you instantiate a copy of the entry block (including any moves emitted by the register allocator) as a prefix to each real entry point.

Thank you for the response and interest in this topic. I agree with your solution to this problem. From a lower level x86 lifting perspective i think we will use the return address to determine which basic block the dispatcher (vm enter block) will dispatch to. Essentially we will make a call into some code which will save all GPR registers and the value of RSP at the point before the vm enter.

This should make all compiler theory algorithms happy!

CompilerSmith (Apr 04 2024 at 22:27):

Another interesting thing to note is that on windows exception handling can produce functions which have multiple entry points.

If you make this code:

__try {
  // something
} __except(1) {
  // code here
}
// more code here

the except block will look like another entry into the function. No blocks directly reference it. The except block will execute then continue execution into the "more code here".

Chris Fallin (Apr 05 2024 at 01:24):

Right, the usual approach to exceptions wrt CFGs is to model the edges out of excepting instructions (e.g. invoke in LLVM terminology) to the catch blocks (landingpads in LLVM terminology); at the regalloc2 level this is mostly an ordinary edge I think (possibly with some fiddly details re: fixed-reg defs on the invoke to get state set by unwinder, which are technically path-dependent defs; folks are thinking through this, we don't have it in Cranelift yet)

Chris Fallin (Apr 05 2024 at 01:25):

notably this would not be a case of a separate function entry; just like the "virtual entry block" we have a virtual edge to keep the IR-level semantics reasonable

CompilerSmith (Apr 05 2024 at 01:48):

Chris Fallin said:

Right, the usual approach to exceptions wrt CFGs is to model the edges out of excepting instructions (e.g. invoke in LLVM terminology) to the catch blocks (landingpads in LLVM terminology); at the regalloc2 level this is mostly an ordinary edge I think (possibly with some fiddly details re: fixed-reg defs on the invoke to get state set by unwinder, which are technically path-dependent defs; folks are thinking through this, we don't have it in Cranelift yet)

Thank you for the response, I spoke with my team and they came to the same conclusion which is we can implicitly mark except blocks as a successor to the try blocks. It might get a little messy in the CFG by marking every block which is described by the "try" range (C_SCOPE_TABLE) but we will experiment and see what works best.

One more question i have about regalloc2 thats been brewing:

So block terminators in regalloc2 can only be "ret" and "branch". The terminators of a basic block in the framework we are building has things like fastfail and int3. When I hook up our framework to regalloc2 should i just set these terminators (fastfail/int3/etc) to "ret"?

https://docs.rs/regalloc2/latest/regalloc2/trait.Function.html#tymethod.is_ret

CompilerSmith (Apr 05 2024 at 01:55):

image.png

Something like this is what we are thinking, each block which is described by the C_SCOPE_TABLE (or any other language specific exception information) will have an implicit successor of the first basic block in the except basic blocks.

Ignoring __finally because finally blocks are actually functions all to themselves. (crazy)

Chris Fallin (Apr 05 2024 at 15:14):

yes, ret in this context means "terminator that leaves the function" (as opposed to a transfer to some other block); it's reasonable to use that for e.g. int3 if you don't resume after (i.e. a terminating/unwinding trap rather than a resumable breakpoint)

Notification Bot (Apr 07 2024 at 19:00):

CompilerSmith has marked this topic as resolved.

Notification Bot (Apr 07 2024 at 19:00):

CompilerSmith has marked this topic as unresolved.

Notification Bot (Apr 07 2024 at 19:00):

CompilerSmith has marked this topic as resolved.

Last updated: Apr 09 2025 at 17:03 UTC