Defeating Nested Virtualization with Miasm - FCSC21 CTF VMV

Suraj Malhotra

2021-05-19

CTF Writeups

miasm, reversing, vm

FCSC was a week long CTF and I usually like these long duration ctfs as I’m not that good of a speedrunner tbh :( Also I’m a bit late for this writeup but anyways it better late than never!

TL;DR

This challenge features a nested virtual machine with 5 levels of nesting and is great for learning the tooling aspect of reverse engineering. I solved it by writing a disassembler for it in miasm and then applying some pattern detection logic on it. The virtual machines are nearly similar but they differ in opcodes and some values used to decode an instruction, so we’ve to deal with that too. At the last level we find the key check algo implementation which is not that difficult and we write a solution script to get the key.

Challenge

Though FCSC had many interesting reversing challenges, I dedicated most of my time to VMV.
The challenge requires a 16 byte string as an argument.

└──╼ $time ./vmv helloworld123456
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
[fr] Ne quittez pas, un correspondant va prendre votre appel... [\fr]
Noooooo, damn you lost!

real    3m54.816s
user    3m54.554s
sys     0m0.224s

It prints the message :
“[fr] Ne quittez pas, un correspondant va prendre votre appel… [\fr]”
which translates to -
“Do not leave, a correspondent will take your call…”
and seems like this is a waiting message as it checks our key. Yeah its obvious that it’ll take time as it’s a VM(V) but this much… Oh boy!

The Main Function

There is a single function which takes various functions as arguments and random values?

Turns out it is initializing a hashmap with key and functions as its keys and values respectively. There are two base64 encrypted strings which when decoded are used as the bytecode.
But where has our input disappeared?
Our input is concatenated to the second base64 decoded buffer for now. Finally, the Dispatcher consists of a bytecode iterator and the same hashmap.

VM Analysis

After getting a basic idea of the VM Implementation, I figured out the registers and other variables used by the VM.

VM Context Structure

I defined the structure as follows:

0x0000000 vm_context      struc
0x0000000 stack_base      dq ?      //Stack used by the VM  
0x0000008 [r0-r6]         dd ?      
0x0000024 pc              dd ?      //Program Counter
0x0000028 vm_mem_ptr      dd ?      //Ptr/Idx to the VM Memory 
0x000002C lr              dd ?      //Link Register
0x0000030 [r10-r11]       dd ?
0x0000038 sp              dd ?      //Stack Pointer
0x000003C [r13-r19]       dd ?
0x0000058 bytecode_base   dq ?      //Base address of the Bytecode Buffer
0x0000060 vm_mem          dq ?      //VM Memory
0x0000068 vm_mem_next_ptr dd ?      //Next Ptr/Idx to the VM Memory 
0x000006C field_6C        dd ?      //null
0x0000070 rom             dq ?      //remaining bytecode buffer for next VM
0x0000078 rom_ptr         dq ?      //Ptr/Idx to the remaining buffer
0x0000080 loop_status     dd ?      //flag to end the loop if not set
0x0000084 exit_code       dd ?      //exit code
0x0000088 put_flag        db ?      //Bool flag set after putchar has been called
0x0000089 vm_context      ends

Opcodes

This VM consists of 24 opcodes in total.

I defined them as follows :

PUSH_REG, PUSH_IMM, POP
There are two variants of the push instruction : PUSH_REG(pushes a register) and PUSH_IMM(pushes an 32 bit immediate). As it is an ascending stack so the SP is incremented after a PUSH.
ADD, XOR, MUL, AND
The above arithmetic operations work by popping two values off the stack and then pushing their result back. The result also suffers a modulo operation with 0x7fffffff, except for XOR.
INC, DEC
Increments/Decrements the specified register.
JE, JNE
These are used for comparision and jumping at the same time here. They pop two values off the stack and compare them, later jump to the specified destination depending on the result.
JMP
Relative Jump where destination = PC + Immediate
CALL, RET
Before CALL, it assigns the value of current PC to LR(Link Register) and then calls a function. For example, in ARM, the BL instruction places the return address in the LR(Link Register). The opposite is done when returning from the function.
MEMSTORE, MEMFETCH, MEMSHIFT_REG, MEMSHIFT_IMM
The VM has its own memory which is initialized in the VM_CONTEXT_INIT function as an array of 0x2000000 32-bit values. VM_MEM_PTR register is used as an index to the memory and is used by MEMFETCH and MEMSTORE. MEMSTORE puts the value of a register in memory and MEMFETCH does the opposite. The MEMSHIFT_REG and MEMSHIFT_IMM are both used to save the current value of VM_MEM_NEXT_PTR in VM_MEM_PTR and shifts VM_MEM_NEXT_PTR by a specified value (REG/IMM).
ROMFETCH
Fetches 32 bits from the remaining bytecode treated as ROM here. Afterwards it increments the ROM_PTR which acts as an index.
PUTCHAR_REG, PUTCHAR_IMM
These are used to display a character and afterwards set the PUT_FLAG which is used to display the waiting message.
MOD
Pops and does a modulo operation on the value with the specified immediate. Later it pushes the result onto the stack.
EXIT_REG, EXIT_IMM
They set the EXIT_CODE register to a register or an immediate and exits the VM.

LAY(LAYERS)ERS

At this point I wrote a simple disassembler script but whatttt there is a vm inside a vm. Now after analysing it this much there is no going back ryt? I began looking for options and chose miasm for this task as it is a tool I’ve kept my eye on recently.
Also I’ve heard good things about it’s Intermediate Representation, so why not try it ourselves.

Writing a Custom Architecture - Miasm 101

So I needed to start from scratch now. Though this part took most of my time but it was worth it.

I want to thank my friend ArcherCreat for helping me out with miasm. After a little chat with him I got a basic intro to miasm features and how can I use them to solve challenges. Infact he is the one who inspired me at first, you can checkout some of his writeups here.

If you are new to miasm, you can checkout their blog and github for more examples.

Basically we need to create 6 files in order to write a custom architecture in miasm.

/home/user/.local/lib/python3.9/site-packages/miasm/
│  
├── arch
│   ├── vmv
│   │   ├── arch.py
│   │   ├── disasm.py
│   │   ├── ira.py
│   │   ├── jit.py
│   │   ├── regs.py
│   │   └── sem.py

It seems like a lot, but in the end we’ll mainly work with regs.py, arch.py, sem.py as the others are mostly inherited from the base class.

regs.py

We can describe the registers used by our architecture here. By default the registers are 32 bits.

reg_names32 = ["R0","R1","R2","R3","R4","R5","R6","PC","VM_MEM_PTR","LR","R10","R11","SP","R13","R14","R15","R16","R17","R18","R19"]
reg_exprs, reg_inits, reg_infos = gen_regs(reg_names32, globals())  # sz=32 bits (default)

extra_names64 = ["BYTECODE_BASE","VM_MEM","ROM","ROM_PTR"]
extra_exprs, extra_inits, extra_infos = gen_regs(extra_names64, globals(), 64)

vmnp_expr, vmnp_inits, vmnp_infos  = gen_regs(["VM_MEM_NEXT_PTR"], globals())

I added the General Purpose Registers, Control Status Registers, and other miscellaneous 32 and 64 bit registers used by our VM. As the registers are referenced by their index in the struct we need to define them as shown above.

arch.py

We start off by describing the instructions for the first level, adding their opcodes and argument type. This is done by the addop() function.
Miasm automatically extracts the amount of bits from the bytecode according to the operands specified while defining the instructions and updates the offset.

addop("MEMSHIFT_IMM", [bs32(0xba1116a9), imm32,])
addop("PUSH_REG",     [bs32(0x5991ba22), reg_idx,])
addop("POP",          [bs32(0x8960888a), reg_idx,])
addop("PUSH_IMM",     [bs32(0x5e64bb6c), imm32,])
addop("MEMSHIFT_REG", [bs32(0x1f0a8e6f), reg_idx,])
addop("ROMFETCH",     [bs32(0x8d67bae1), reg_idx,])
addop("ADD",          [bs32(0xadc52d), ])
addop("MEMSTORE",     [bs32(0xfb521a9c), reg_idx])
addop("JE",           [bs32(0x180bc12d), imm32])
addop("INC",          [bs32(0xf00bb6c1), reg_idx])
addop("CALL",         [bs32(0xfa83fa5e), imm32])
addop("JMP",          [bs32(0xed4e2cfb), imm32])
addop("EXIT_IMM",     [bs32(0x27497906), imm32])
addop("PUTCHAR_REG",  [bs32(0xd1450d67), reg_idx])
addop("DEC",          [bs32(0x8ea45b38), reg_idx])
addop("XOR",          [bs32(0x48c5ccc6), ])
addop("AND",          [bs32(0x542010a0), ])
addop("MUL",          [bs32(0x560729d),  ])
addop("MEMFETCH",     [bs32(0xc650f15d), reg_idx])
addop("JNE",          [bs32(0x5a0f38fc), imm32])
addop("EXIT_REG",     [bs32(0x818cd6b5), reg_idx])
addop("MOD",          [bs32(0x43ae1f53), reg_idx])
addop("RET",          [bs32(0xbdecfe55), ])

Miasm works on binary stream, and offers us bs8() which converts a single byte opcode to binary. Similarly we can just create a bs32() helper in our case.

class bs32(bs):
    prio = default_prio

    def __init__(self, v, cls=None, fname=None, **kargs):
        super(bs32, self).__init__(int2bin(swap32(v), 32), 32,
                                  cls=cls, fname=fname, **kargs)

This was easy as miasm already provides us with other basic helper functions such as int2bin(), swap32().

We define a global variable nest level to store the current nesting level while trying to disassemble the VM. We also need to initialize the register index and putchar immediate xor lists as these values differ across levels.

1
2
3

nest_level = 0
reg_xorlist = [0x7b]
putchar_xorlist = [0x0]

Now we need to describe the operands and map them to their classes.
We’ll mostly use these :

1
2
3

reg_idx = bs(l=32,   cls=(vmv_reg_idx,  vmv_arg))
imm32 = bs(l=32,  cls=(vmv_imm32, vmv_arg))
putchar_imm32 = bs(l=32,   cls=(vmv_putchar_imm32,  vmv_arg))

As the register isn’t specified normally and its index is decoded on the fly using the succeeding 32 bits from bytecode we need to do a workaround for it as follows:

class vmv_reg_idx(vmv_arg):

  reg_info = reg_infos  # the list of vmv registers defined in regs.py
  intsize = 32
  intmask = (1 << intsize) - 1

  def decode(self, v):
    v = swap_sint(self.l, v) & self.intmask 
    v = (v^reg_xorlist[nest_level])&0x3f
    if v >= len(self.reg_info.expr):
      return False
    self.expr = self.reg_info.expr[v]
    return True

In our class we define a custom decode function where we use the nest_level to get the xor value for a particular VM nesting level.
We do the same for PUTCHAR_IMM operation:

class vmv_putchar_imm32(imm_noarg, vmv_arg):
  intmask = 0xff
  
  def decodeval(self, v):
    return (putchar_xorlist[nest_level] ^ swap_sint(self.l, v)) & self.intmask

While building the CFG, miasm needs to know a few things such as :

When does it need to stop disassembling
When does it need to break the control flow

conditional_branch = ["JE","JNE"]
unconditional_branch = ["JMP"]

call_instr = ["CALL"]
breakflow = ["EXIT_IMM"] + conditional_branch + unconditional_branch + call_instr + ["RET"]

As the branch instructions use relative addressing and depends on PC(offset here) we can do the following to create a label in the expression.

def dstflow2label(self, loc_db):
  loc_arg = self.get_dst_num()
  expr = self.args[loc_arg]
  if not expr.is_int():
    return
  addr = ((int(expr)*4)+self.offset + 8) & 0xffffffff
  loc_key = loc_db.get_or_create_offset_location(addr)
  self.args[0] = ExprLoc(loc_key, expr.size)

sem.py

This is the main mastermind responsible for the Miasm IR.
We just need to describe each instruction and map them to the specific mnemonic defined in the arch.py.

For example, a PUSH_IMM instruction looks like the following :

def push_imm(ir, instr, imm):
  e = []
  int4 = ExprInt(4, 32)
  e += [ExprAssign(ExprMem(SP + int4, 32), imm)]
  e += [ExprAssign(SP, SP + int4)]
  return e, []

We put the specific immediate to [SP+4] and increase SP by 4 afterwards.
Jumps could be tricky, in our case we need to pop two values off the stack so we assign temporary variables to store them and then use the ExprOp FLAG_EQ_CMP to compare them. Later we can use ExprCond to use it as a condition for changing the IRDst. We also decrease the SP by 8 in the end.

def je(ir, instr, dst):
  e = []
  if dst.is_int():
    dst = ExprInt(dst, PC.size)
  elif dst.is_loc():
    dst = ExprLoc(dst.loc_key, PC.size)
  loc_next = ir.get_next_loc_key(instr)
  loc_next_expr = ExprLoc(loc_next, ir.IRDst.size)

  var_A = ExprMem(SP,32)
  var_B = ExprMem(SP - ExprInt(4, 32), 32)
  cmp_ = ExprOp("FLAG_EQ_CMP", var_A, var_B)

  e += [ExprAssign(PC, ExprCond(cmp_, dst, loc_next_expr))]
  e += [ExprAssign(ir.IRDst, ExprCond(cmp_, dst, loc_next_expr))]
  e += [ExprAssign(SP, SP - ExprInt(8,32))]
  return e, []

Analysing Nested VMs

I began scrolling through the CFG, and found some notable differences.
Now we don’t have the hashmap implementation and there is a simple jump table in every nested vm.

An interesting thing to note here is that a specific instruction has the same offset in every VM.

As the register position differs here we will actually see that disassembly is inconsistent in some places such as at function @ 0xdc0 which acts as a bytecode iterator.

For the first nested VM the VM_CONTEXT structure is stored at R3 and checking out 0xdc0 the first register looks like a program counter. R6 looks like the VM_MEM_PTR comparing the differences between the blocks.
So in a nutshell, it can be described as follows :

0x00 vm_context      struc
0x00 PC               dd ?
0x08 [R1-R4]          dd ?
0x28 VM_MEM_PTR       dd ?
0x30 [R6-R15]         dd ?
0x80 vm_context      ends

But these register positions change in other levels as well but this is not much of a issue here.

The ROM has the following format :

The size of the next bytecode buffer is read from the first 4 bytes and then those no. of 32 bit values are fetched from the ROM and stored in VM memory. We observe that the sizes of the bytecode are also similar here and our input key is at the end.

Intuition

So overall, the nested VMs are almost similar and we can make use of the same instruction offsets as follows :

Create a pattern detection helper to check for subsequent PUSH and JE operation and get their values and offsets respectively.
Map handlers to offsets in a dictionary as {handler : offset}
Keep on adding new opcodes at each level

Miasm allows us to define a callback function when it diassembles the blocks.
We can implement it as follows :

def get_opcodes(mdis, cur_bloc, offsets_to_dis):
    ...

mdis = machine.dis_engine(bytecode, loc_db=loc_db)
mdis.dis_block_callback = get_opcodes

Pattern 1

As obvious we need to extract the opcode and the offset.

Just adding an if statement does the job, where we look out for a block having 3 instructions PUSH_REG, PUSH_IMM, JE respectively and keep on adding the results to a dictionary.

if (len(cur_bloc.lines) == 3 and
        cur_bloc.lines[0].name == "PUSH_REG" and
        cur_bloc.lines[1].name == "PUSH_IMM" and
        cur_bloc.lines[2].name == "JE"):
    
    op = cur_bloc.lines[1].args[0]
    jmp_loc = cur_bloc.lines[2].args[0]
    loc = jmp_loc.loc_key
    offset = mdis.loc_db.get_location_offset(loc)
    if isinstance(op, ExprInt):
        op = int(op.arg)
    opcodes[op] = offset

Later we can just add the opcodes using the same old addop(), neat!

Pattern 2

We need another pattern for extracting the xor values for register indices, and putchar immediate xor values as well.

Here we check the block offset as the values are always initialized in that single block ie. 0xe8 and append those values to a global list.

1
2
3

if(mdis.loc_db.get_location_offset(cur_bloc.loc_key) == 0xe8):
    vmv_arch.putchar_xorlist.append(int(cur_bloc.lines[0].args[0]) & 0xff)
    vmv_arch.reg_xorlist.append(int(cur_bloc.lines[2].args[0]) & 0xff)

Drawbacks Of Current Logic

My logic would fail when :

Some of the opcodes are similar across levels but they map to a different instruction which would cause interference.
Pattern isn’t found successfully due to obfuscation or different dispatchers.
Offsets differ in a level

Although I could’ve done some workaround to deal with some of the issues but I didn’t bother doing it at the moment.

IR Simplification - SSA

Finally we get the Key checking algo unfolded, and we can make use of miasm to simplify the IR to SSA Expressions and extract the constraints easily.

ira = machine.ira(loc_db)
ircfg = ira.new_ircfg_from_asmcfg(asmcfg)
simp = IRCFGSimplifierSSA(ira)
simp.simplify(ircfg, loc)

Just keep in mind at this point we only have our input in the ROM left.
The key checking algo isn’t that difficult to solve.

The first constraint checks the first 4 bytes of our input key as:

1	(@[ROM + ROM_PTR] * 0x117052C0) % 0x7fffffff == 0x1

So, we just need to calculate the inverse modulo to get the first four bytes of our input here. The second one is easy as well, it checks the next 4 bytes as follows :

1 2	(@[ROM + ROM_PTR + 0x4] % 0x77f3) == 0x4926 (@[ROM + ROM_PTR + 0x4] % 0x7c49) == 0x3159

We can use the Chinese Remainder Theorem to solve it easily.

These operations are also carried out for the other two parts. A variable is set if all of these checks return true, ie. R16 here, and we get the failure or success message printed in the end.

Solution Script

Miasm also comes with features like symbolic execution and provides translators for SMT solvers like z3 but I don’t think it’ll help us here. A solution script using some basic math is enough.

import string
int_to_bytes = lambda val : "".join([chr((val & (0xff << i*8)) >> i*8) for i in range(4)])
printcheck = lambda lol : all(c in string.printable for c in lol)

#inverse modulo (python 3.8+)
def invmod(val, mod):
    return pow(val, -1, mod)

#implementation of chinese remainder theorem
def crt(r1, r2):
    n1 = 0x77f3
    n2 = 0x7c49
    x, y = invmod(n1, n2), invmod(n2, n1)
    m = n1 * n2
    n = r2 * x * n1 + r1 * y * n2
    q, r = (m, n % m)
    # return a printable result from first 5 solutions
    for k in range(5):
        res = q*k + r
        out = int_to_bytes(res)
        if printcheck(out):
            return out

if __name__ == "__main__":
    key = ""
    key += int_to_bytes(invmod(0x117052c0, 0x7fffffff))
    key += crt(0x4926,0x3159)
    key += int_to_bytes(invmod(0x278bce9d, 0x7fffffff))
    key += crt(0x28b2,0x44a9)

    print(key)

Finally, we get the flag as :

1	FCSC{w3NeEdT0gODe3p3r}

Solution Files

I’ve uploaded the architecture and other solution files along with the dot output files on my github here :
mrT4ntr4/Challenge-Solution-Files/FCSC21_CTF_VMV

Ahh, I still have more to learn about miasm, bye!