cea-sec/miasm

View on GitHub
doc/expression/expression.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intermediate Representation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Miasm provides an [intermediate representation](https://en.wikipedia.org/wiki/Intermediate_representation)\n",
    "(*IR*) to represent the effects of a source code (for example, like LLVM). The benefits of using an IR are:\n",
    "* A unified representation that does not depend on the source architecture;\n",
    "* A minimal language;\n",
    "* The side effects are explicit, for example *A + B* will not implicitly update flags.\n",
    "\n",
    "Miasm's IR implementation is located in `miasm.expression.expression`, with `Expr*` objects. Rules of the language will be enumerated along this documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: Each expression has a size (in bit). Some expressions need this size at creation time, others compute their size from their arguments.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Language"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The basic words in the language are:\n",
    "* `ExprId` : represents an identifier. For example, the register `EAX` will be represented by an `ExprId` of size 32 bits.\n",
    "* `ExprInt` : represents an unsigned integer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "ExprId('a', 32)\n",
      "a\n",
      "32\n"
     ]
    }
   ],
   "source": [
    "from miasm.expression.expression import *\n",
    "\n",
    "a = ExprId(\"a\", 32)\n",
    "print(a)\n",
    "print(repr(a))\n",
    "\n",
    "# Identifier\n",
    "print(a.name)\n",
    "print(a.size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0x10\n",
      "0xFFFFFFFF\n",
      "16\n"
     ]
    }
   ],
   "source": [
    "cst1 = ExprInt(16, 32)\n",
    "print(cst1)\n",
    "cst2 = ExprInt(-1, 32)\n",
    "print(cst2)\n",
    "\n",
    "# Show associated value\n",
    "print(int(cst1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The word `ExprMem` represents a memory access, of a given size in bits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@16[0x11223344]\n",
      "0x11223344\n"
     ]
    }
   ],
   "source": [
    "# Memory access of 16 bits, at address 0x11223344 on 32 bits\n",
    "addr = ExprInt(0x11223344, 32)\n",
    "mem1 = ExprMem(addr, 16)\n",
    "print(mem1)\n",
    "\n",
    "# Show memory address\n",
    "print(mem1.ptr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The word `ExprOp` describes the n-ary operations between expressions. The operation is a string, so new operations can be created on the fly. Some operations (`+`, `*`, `|`, `parity`, ...) are already used by Miasm. An operation occurs between elements having the same size and has the same size as the arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a + 0x10\n",
      "(ExprId('a', 32), ExprInt(0x10, 32))\n",
      "MyCustomOp(a)\n"
     ]
    }
   ],
   "source": [
    "# Defining an operation\n",
    "op1 = ExprOp(\"+\", a, cst1)\n",
    "print(op1)\n",
    "\n",
    "# Accessing the arguments\n",
    "print(op1.args)\n",
    "\n",
    "# Creating a custom operation\n",
    "op2 = ExprOp(\"MyCustomOp\", a)\n",
    "print(op2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: In _most_ cases, the arguments of an ExprOp must have the same size. The ExprOp size is then the size of its arguments. Some exceptions exist, for example the operator \"```==```\" has a size of 1 bit. Size exceptions are described here: https://github.com/cea-sec/miasm/blob/5ee76eed19b62983909d6a16966acc15b2c8726f/miasm/expression/expression.py#L1045**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Helpers ease the creation of common operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a + 0x10\n",
      "a * 0x10\n",
      "-a\n",
      "a | 0x10\n",
      "a & 0x10\n"
     ]
    }
   ],
   "source": [
    "print(a + cst1)\n",
    "print(a * cst1)\n",
    "print(- a)\n",
    "print(a | cst1)\n",
    "print(a & cst1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Be careful, even though the Expressions can represent \"everything\", Miasm assumes some properties on certain operations:\n",
    "* the associative operations (`+`, `^`, `|`, ...) are n-ary operations ;\n",
    "* the `-` is always unary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a + -0x10\n"
     ]
    }
   ],
   "source": [
    "print(a - cst1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `parity` has always a size 1, it's an exception"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "32\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "p = ExprOp(\"parity\", a)\n",
    "print(a.size)\n",
    "print(p.size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `=` operation is handled separately by the word `ExprAssign`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a = 0x10\n",
      "0x10\n",
      "a\n"
     ]
    }
   ],
   "source": [
    "assign = ExprAssign(a, cst1)\n",
    "print(assign)\n",
    "\n",
    "# Source, destination\n",
    "print(assign.src)\n",
    "print(assign.dst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: The ```ExprAssign``` is not really a word of the language: it can be considered as a statement. Thus, it cannot be an argument of another expression.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The word `ExprCond` represents a ternary relation, equivalend to the Python `src1 if cond else src2`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a?(0x10,0xFFFFFFFF)\n",
      "a\n",
      "0x10\n",
      "0xFFFFFFFF\n"
     ]
    }
   ],
   "source": [
    "cond = ExprCond(a, cst1, cst2)\n",
    "print(cond)\n",
    "\n",
    "# Access to the elements\n",
    "print(cond.cond)\n",
    "print(cond.src1)\n",
    "print(cond.src2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: The sizes of ```src1``` and ```src2``` must be equal, but can differ from the size of ```cond```.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following words manipulate the sizes:\n",
    "* `ExprSlice`: extracts a bits slice of an expression;\n",
    "* `ExprCompose`: concatenates two expressions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a[6:8]\n",
      "2\n",
      "a\n",
      "6\n",
      "8\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sl = ExprSlice(a, 6, 8)\n",
    "print(sl)\n",
    "print(sl.size)\n",
    "\n",
    "# Access to the properties\n",
    "print(sl.arg)\n",
    "print(sl.start)\n",
    "print(sl.stop)\n",
    "\n",
    "# Simpler form\n",
    "sl == a[6:8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: The size of an ```ExprSlice``` is equal to ```stop - start```.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{a 0 32, 0x10 32 64}\n",
      "64\n",
      "(ExprId('a', 32), ExprInt(0x10, 32))\n",
      "[(0, ExprId('a', 32)), (32, ExprInt(0x10, 32))]\n"
     ]
    }
   ],
   "source": [
    "# Concatenation of a (bit 0 to 31) with cst1 (bit 32 to 63)\n",
    "comp = ExprCompose(a, cst1)\n",
    "print(comp)\n",
    "print(comp.size)\n",
    "\n",
    "# Access to the arguments\n",
    "print(comp.args)\n",
    "# Access to the starting bit and the associated argument\n",
    "print(list(comp.iter_args()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Rule: The size of an ```ExprCompose``` is equal to the sum of the sizes of its arguments.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, the word `ExprLoc` represents a memory location.\n",
    "For example, it can represent the destination of a jump or a function call.\n",
    "\n",
    "A location is described by a unique element of type  `LocKey`. You can see the `LocKey` as a key that you can use to retrieve all the information associated with a location: its offset, its name (\"main\") etc.\n",
    "`ExprLoc` is a kind of `LocKey` container."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loc_key_1\n"
     ]
    }
   ],
   "source": [
    "loc = ExprLoc(LocKey(1), 32)\n",
    "print(loc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In summary, the different words are:\n",
    "\n",
    "| Word | Meaning |\n",
    "|-----|----------|\n",
    "|ExprAssign|A=B|\n",
    "|ExprInt|0x18|\n",
    "|ExprId|EAX|\n",
    "|ExprLoc|label_1|\n",
    "|ExprCond|A ? B : C|\n",
    "|ExprMem|@16[ESI]|\n",
    "|ExprOp|A + B|\n",
    "|ExprSlice|AH = EAX[8 :16]|\n",
    "|ExprCompose|AX = AH.AL|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Common helpers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ExprInt(0xFFFFFFFF, 32)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Proper size mask\n",
    "a.mask"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "32"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Expression size\n",
    "a.size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a 0x10\n"
     ]
    }
   ],
   "source": [
    "# Printable version\n",
    "print(a, cst1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ExprId('a', 32) ExprOp('+', ExprId('a', 32), ExprInt(0x10, 32))\n"
     ]
    }
   ],
   "source": [
    "# Expr representation (can be copy-pasted in the code)\n",
    "print(repr(a), repr(a + cst1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "zeroExt_64(0x10)\n",
      "signExt_64(0x10)\n"
     ]
    }
   ],
   "source": [
    "# Size extension (unsigned, signed)\n",
    "print(cst1.zeroExtend(64))\n",
    "print(cst1.signExtend(64))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a[31:32]\n"
     ]
    }
   ],
   "source": [
    "# Most significant bit\n",
    "print(a.msb())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a + a + 0x10\n",
      "0xFFFFFFFF + 0xFFFFFFFF + 0x10\n"
     ]
    }
   ],
   "source": [
    "# Replacement\n",
    "expr1 = a + a + cst1\n",
    "print(expr1)\n",
    "expr2 = expr1.replace_expr({a: cst2})\n",
    "print(expr2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n",
      "True\n",
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "# Type test\n",
    "print(a.is_id())\n",
    "print(a.is_int())\n",
    "print(cst1.is_int())\n",
    "print(op1.is_op())\n",
    "print(op1.is_op(\"+\"))\n",
    "print(op1.is_op(\"&\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Expression represented by a graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Miasm IR expressions have a recursive structure and can be represented and manipulated as graphs.\n",
    "The graph object is a `DiGraph`, implemented in `miasm.core.graph`. It offers usual methods for manipulating graphs (node and vertex access, predecessors, successors, dominators, graphviz dot representation ...)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(a + 0x10) & 0xFFFFFFFF\n",
      "(a + 0x10) & 0xFFFFFFFF\n",
      "0x10\n",
      "0xFFFFFFFF\n",
      "a\n",
      "a + 0x10\n",
      "a + 0x10 -> a\n",
      "a + 0x10 -> 0x10\n",
      "(a + 0x10) & 0xFFFFFFFF -> a + 0x10\n",
      "(a + 0x10) & 0xFFFFFFFF -> 0xFFFFFFFF\n"
     ]
    }
   ],
   "source": [
    "expr3 = a + cst1 & cst2\n",
    "print(expr3)\n",
    "graph = expr3.graph()\n",
    "print(graph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "dot = graph.dot()\n",
    "#from graphviz import Source\n",
    "#src = Source(dot)\n",
    "#src"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Expression simplification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The expression simplification applies transformation rules to an expression until they can be applied. This process is done by an `ExpressionSimplifier` object, implemented in `miasm.expression.simplifications`.\n",
    "\n",
    "Some basic simplifications are already implemented and can be activated with the `expr_simp` instance in the module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0x10 + 0xFFFFFFFF\n",
      "0xF\n",
      "0x10[4:5]\n",
      "0x1\n",
      "a + a + -a\n",
      "a\n",
      "a + 0x10\n",
      "0x10 + 0x10\n",
      "0x20\n",
      "a * 0x4\n"
     ]
    }
   ],
   "source": [
    "from miasm.expression.simplifications import expr_simp\n",
    "\n",
    "# 0x10 + (-1) = 0xF\n",
    "op3 = cst1 + cst2\n",
    "print(op3)\n",
    "cst3 = expr_simp(op3)\n",
    "print(cst3)\n",
    "\n",
    "# 5th bit of 0x10 = 1\n",
    "sl2 = cst1[4:5]\n",
    "print(sl2)\n",
    "cst4 = expr_simp(sl2)\n",
    "print(cst4)\n",
    "\n",
    "# a + a - a = a\n",
    "op4 = a + a - a\n",
    "print(op4)\n",
    "print(expr_simp(op4))\n",
    "assert expr_simp(op4) == a\n",
    "\n",
    "# Use to evaluate an expression (here a + 0x10 is evaluated with a = 0x10)\n",
    "print(op1)\n",
    "print(op1.replace_expr({a: cst1}))\n",
    "print(expr_simp(op1.replace_expr({a: cst1})))\n",
    "print(expr_simp(a + a +a + a))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Transformation rules can be added with the method `enable_passes`. A transformation rule is a function and is associated to an expression type.\n",
    "\n",
    "Below, the code transforms the booleans operation in an `ExprCond` to an operation of type `<`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "((x + -y) ^ ((x ^ y) & ((x + -y) ^ x)))[31:32]\n",
      "True\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "x = ExprId(\"x\", 32)\n",
    "y = ExprId(\"y\", 32)\n",
    "\n",
    "inf_signed = ((x - y) ^ ((x ^ y) & ((x - y) ^ x)))[31:32]\n",
    "print(inf_signed)\n",
    "\n",
    "def is_inf(x_val, y_val):\n",
    "    new_val = expr_simp(inf_signed.replace_expr({\n",
    "        x: x_val,\n",
    "        y: y_val,\n",
    "    }))\n",
    "    assert new_val.is_int()\n",
    "    return int(new_val) == 1\n",
    "\n",
    "# 0 < 10\n",
    "print(is_inf(ExprInt(0, 32), ExprInt(10, 32)))\n",
    "# 10 !< 10\n",
    "print(is_inf(ExprInt(10, 32), ExprInt(10, 32)))\n",
    "# -1 < 0\n",
    "print(is_inf(ExprInt(0xFFFFFFFF, 32), ExprInt(0, 32)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code enables this transformation, which was already implemented but not enabled: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{<class 'miasm.expression.expression.ExprCond'>: [<function expr_simp_equal at 0x7f4dd825daf0>],\n",
      " <class 'miasm.expression.expression.ExprOp'>: [<function expr_simp_inverse at 0x7f4dd825da60>],\n",
      " <class 'miasm.expression.expression.ExprSlice'>: [<function expr_simp_inf_signed at 0x7f4dd825d940>,\n",
      "                                                   <function expr_simp_inf_unsigned_inversed at 0x7f4dd825d9d0>]}\n",
      "(((x ^ y) & (x ^ (x + -y))) ^ (x + -y))[31:32]\n",
      "x <s y\n"
     ]
    }
   ],
   "source": [
    "from pprint import pprint as pp\n",
    "from miasm.expression.simplifications import ExpressionSimplifier\n",
    "pp(ExpressionSimplifier.PASS_COND)\n",
    "\n",
    "print(expr_simp(inf_signed))\n",
    "expr_simp_cond = ExpressionSimplifier()\n",
    "expr_simp_cond.enable_passes(ExpressionSimplifier.PASS_COND)\n",
    "print(expr_simp_cond(inf_signed))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Going further"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some functionalities relative to expressions but not detailed here:\n",
    "* `SymbolicExecutionEngine` : Symbolic execution;\n",
    "* `Translators` : expression translation to C, Python, \"Miasm like\" or z3 expressions;\n",
    "* `expr_range` : Range analysis of an expression possible values;\n",
    "* `AssignBlock`, `IRBlock`, `DiGraphDefUse`, `dead_simp`, ... : grouping expressions to describe a full program, associated analysis;\n",
    "* `miasm.arch.*.sem.py`, `SemBuilder` : architecture specific semantic descriptions, i.e. all the side effects of a mnemonic."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}