Hi, I have one more question/dilemma that i would like to present, its more of an issue on my end so dont worry if suggestions or thoughts arent perfect.
We are building a binary transformation framework and one of the things our framework does to ensure we dont break code when we move it is that for each basic block of a function, if there exists bytes above the basic block that we havent disassembled then we assume its code that could fallthrough into the basic block we are going to move.
To solve this we insert a thunk to jmp to the basic block. However this is a big problem because it creates multiple potential entrypoints into the function. This is going to cause issues in general with compiler theory since most algorithms are based upon the fact there exists only 1 entry into a function.
Regalloc2 states the following:
The CFG must have no critical edges. A critical edge is an edge from block A to block B such that A has more than one successor and B has more than one predecessor. For this definition, the entry block has an implicit predecessor, and any block that ends in a return has an implicit successor.
Will our framework break regalloc2/prevent register allocation? if so how do you suggest we handle such situations? Its an interesting problem but without these thunks its not safe to move code.
here is a visual example of the problem just for reference purposes:
As you can see there is a jump into the middle of another functions basic block which our framework is unable to safetly uncover. We will insert a thunk at 0x1826F2D20 since the bytes above it arent disassembled by our framework.
Sidebar: ida knows its call but ida can be wrong sometimes, we cannot be wrong when we disassemble things to be lifted and moved.
CompilerSmith has marked this topic as unresolved.
regalloc2 definitely needs a single entry-point; you're correct that this is an important invariant, and many many compiler-algorithm details break without it.
however I think you can recover this property in a function with multiple entry points by having a "virtual entry block" that has N successors, each of the "real" entry blocks? You can even split those edges and put pseudoinsts in the edge blocks with various constraints if you need inputs to come in in different registers/locations
Chris Fallin said:
regalloc2 definitely needs a single entry-point; you're correct that this is an important invariant, and many many compiler-algorithm details break without it.
however I think you can recover this property in a function with multiple entry points by having a "virtual entry block" that has N successors, each of the "real" entry blocks? You can even split those edges and put pseudoinsts in the edge blocks with various constraints if you need inputs to come in in different registers/locations
Thank you for the response :)
Its rare that functions ever share basic blocks. The only time ive ever seen this happen is within the deep end of chrome.dll. We added code to check to see if the bytes above the basic block were padding (int3 or ud2) and that helped us reduce the number of thunks required. Right now we are able to rewrite over 95.6% of chrome.dll in about 40 seconds. We recreate all exception information as well.
However we are restricted by the frameworks thunk requirements. We might have to lose coverage to completely remove all functions which require multiple entrypoints...
I am in favor of the virtual entry block where we preserve the entire processors context and then dispatch to the basic block that its actually meaning to go into. We plan to do obfuscation and this might be the trick. Our obfuscated function will therefore have 1 entry point.
I've actually implement this virtual entry point mechanism in a compiler that uses regalloc2. Basically you have an entry block that only contains 2 instructions:
Entry
which defines all of the incoming arguments (must be the same for all entry points).EntrySwitch
which is a terminator with multiple successor blocks, one for each actual entry point.Then, after register allocation, you instantiate a copy of the entry block (including any moves emitted by the register allocator) as a prefix to each real entry point.
Amanieu said:
I've actually implement this virtual entry point mechanism in a compiler that uses regalloc2. Basically you have an entry block that only contains 2 instructions:
Entry
which defines all of the incoming arguments (must be the same for all entry points).EntrySwitch
which is a terminator with multiple successor blocks, one for each actual entry point.Then, after register allocation, you instantiate a copy of the entry block (including any moves emitted by the register allocator) as a prefix to each real entry point.
Thank you for the response and interest in this topic. I agree with your solution to this problem. From a lower level x86 lifting perspective i think we will use the return address to determine which basic block the dispatcher (vm enter block) will dispatch to. Essentially we will make a call into some code which will save all GPR registers and the value of RSP at the point before the vm enter.
This should make all compiler theory algorithms happy!
Another interesting thing to note is that on windows exception handling can produce functions which have multiple entry points.
If you make this code:
__try {
// something
} __except(1) {
// code here
}
// more code here
the except block will look like another entry into the function. No blocks directly reference it. The except block will execute then continue execution into the "more code here".
Right, the usual approach to exceptions wrt CFGs is to model the edges out of excepting instructions (e.g. invoke
in LLVM terminology) to the catch blocks (landingpad
s in LLVM terminology); at the regalloc2 level this is mostly an ordinary edge I think (possibly with some fiddly details re: fixed-reg defs on the invoke to get state set by unwinder, which are technically path-dependent defs; folks are thinking through this, we don't have it in Cranelift yet)
notably this would not be a case of a separate function entry; just like the "virtual entry block" we have a virtual edge to keep the IR-level semantics reasonable
Chris Fallin said:
Right, the usual approach to exceptions wrt CFGs is to model the edges out of excepting instructions (e.g.
invoke
in LLVM terminology) to the catch blocks (landingpad
s in LLVM terminology); at the regalloc2 level this is mostly an ordinary edge I think (possibly with some fiddly details re: fixed-reg defs on the invoke to get state set by unwinder, which are technically path-dependent defs; folks are thinking through this, we don't have it in Cranelift yet)
Thank you for the response, I spoke with my team and they came to the same conclusion which is we can implicitly mark except blocks as a successor to the try blocks. It might get a little messy in the CFG by marking every block which is described by the "try" range (C_SCOPE_TABLE) but we will experiment and see what works best.
One more question i have about regalloc2 thats been brewing:
So block terminators in regalloc2 can only be "ret" and "branch". The terminators of a basic block in the framework we are building has things like fastfail and int3. When I hook up our framework to regalloc2 should i just set these terminators (fastfail/int3/etc) to "ret"?
https://docs.rs/regalloc2/latest/regalloc2/trait.Function.html#tymethod.is_ret
Something like this is what we are thinking, each block which is described by the C_SCOPE_TABLE (or any other language specific exception information) will have an implicit successor of the first basic block in the except basic blocks.
Ignoring __finally
because finally blocks are actually functions all to themselves. (crazy)
yes, ret
in this context means "terminator that leaves the function" (as opposed to a transfer to some other block); it's reasonable to use that for e.g. int3
if you don't resume after (i.e. a terminating/unwinding trap rather than a resumable breakpoint)
CompilerSmith has marked this topic as resolved.
CompilerSmith has marked this topic as unresolved.
CompilerSmith has marked this topic as resolved.
Last updated: Jan 24 2025 at 00:11 UTC