MODeflattener - Miasm's OLLVM Deflattener
So recently a challenge(Layers) from 3kCTF featured control flow flattening using OLLVM. Although I did know about control flow flattening I hadn’t encountered it personally. And as I’ve been experimenting with miasm for the past few days I thought of developing a tool to deal with it.
Control Flow Flattenning
Control flow flattening is an interesting and clever technique to make a reverse engineer’s day difficult. It basically puts all the basic blocks in a function at the same level and destroys the control flow. It then uses a dispatcher to reconstruct the control flow along with a control/state variable which is updated at the end of each block.
The following illustration depicts various parts of a flattened function.
Various obfuscators employ their own version of control flow flattening transformations :
CAUTION : Only Static Analysis used !
As we are primarly focusing on OLLVM for now we can just use some static analysis techniques to deobfuscate it, but for developing a universal tool to deal with this kind of obfuscation we need to look more into dynamic approaches.
For example, this looks interesting.
Getting Flattening Information
To start off, we need to identify the relevant blocks and the dispatcher. To find the dispatcher we need to first find the pre-dispatcher. This is because we rely on the fact that the pre-dispatcher has the maximum number of predecessors, it is easy to identify. Later we can just get the first successor of the pre-dispatcher to get the dispatcher, easy!
1 | def get_cff_info(asmcfg): |
Also we now already have the relevant blocks as they are just the predecessors to the pre-dispatcher we just found!
State Variable
The state variable is responsible for maintaining the control flow in the flattened function.
The state variable is always initialized before the dispatcher and is used in the first line of the dispatcher. We can use this information to get the state variable automatically.
Relevant Blocks
The blocks that are aligned at the same level in the disassembly graph and include the useful code are known as relevant blocks. These blocks update the value of the state variable at the very end.
Note: Currently we also add the tail of the backbone to our relevant blocks as we are just depending on the predecessors of the pre-dispatcher. Basically the tail is used if the state variable value doesn’t satisfy any condition in the backbone. It doesn’t update the state variable and only has a jump to pre-dispatcher. So if we don’t find any code related to modification of the state variable in a relevant block we mark this as tail.
Having most of the information we can now proceed with deflattening the flow.
Basically we have two types of relevant blocks:
Simple
Block without any conditions, so the state variable is always updated with the same value. Only one instruction is used to modify the state variable.Conditional
Blocks with conditional statements and loops. Here the state variable could have two possible values depending on whether the condition results in a true or false. These often end with acmov
instruction. Several instructions are used to modify the state variable.
Use of SSA Expressions
We further simplify the IR to SSA to deal with the conditional relevant blocks.
We only make use of do_propagate_expressions
ssa simplification pass.
1 | head = loc_db.get_offset_location(addr) |
In the SSA form we observe a Phi operation which basically means that one of the variables arriving from different predeccesors is chosen depending on which path the control flow took.
In the above example we observe that if the condition is true the state variable is assigned the value = 0x7A3F9928 (RCX.4) and if false value = 0x30110039 (RCX.2).
We get the block where the phi variables are assigned using the following code and then get their values from those blocks.
1 | if irblock_has_phi(irblock): |
We get the following information from a relevant block:
1 | 0x401a9d: {'cond': 'CMOVB', |
Now we need to iterate over the backbone blocks and get their destinations if it includes condition regarding any of the possible state variable values. We can map these to the original state variable values and make use of it to correct the flow.
1 | if isinstance(arg, ExprInt): |
After resolving these values from the backbone, the final result looks like this.
1 | 0x401a9d: {'cond': 'CMOVB', |
Removing Useless Instructions
Modeflattener finds the addresses to nop out using the def-use graph for the state variable. This is one of the data flow analysis feature provided by miasm. This algorithm returns all the instructions affecting the state variable and therefore we call these as useless instructions. Read more about it here
The state variable is always located in one of the leaves in the graph, we can easily get all of its parents using the following code.
1 | def find_state_var_usedefs(ircfg, search_var): |
Patching and Reconstructing the Control Flow
While cleaning these useless instructions we have to keep in mind that the call instructions will get affected by this as they are based on relative offsets.
We can fix it using the following code :
1 | rel = lambda addr, patch_addr: hex(addr - patch_addr) |
At last we need to generate a patch for jumps and reconstruct the control flow.
For a simple relevant block we only need a single patch.
1 | asmb = lambda patch_str, loc_db: mn_x86.asm(mn_x86.fromstring(patch_str, loc_db, 32))[0] |
We have two instruction patches for a conditional relevant block. We replace the conditional move with a conditional jump to the true address and add another jump in succession to the false address.
1 | t_addr = link['true_next'] |
We can nop out the backbone as it is useless now.
1 | backbone_start, backbone_end = dispatcher, tail.offset + tail.l |
Final Results
Graph View
F5 View
Get MODeflattener
I’ve open sourced the tool on my github. I’ve added some samples to test it as well. Try it out!
https://github.com/mrT4ntr4/MODeflattener
Bonus
Tim Blazytko’s flattening heuristic script
While disassembling the specified function we can look out for other functions used by it and can make use of this script to automatically detect whether it is a flattened one and try to deobfuscate it. This has already been integrated into the tool!nop-hider idapython script
This script hides the nop instructions from IDA graph view as the backbone is converted into a long nop chain after deobfuscation.
References
Dissecting LLVM Obfuscator - RPISEC
Automated Detection of Control-flow Flattening - Tim Blazytko