doc/expression/expression.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intermediate Representation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Miasm provides an [intermediate representation](https://en.wikipedia.org/wiki/Intermediate_representation)\n",
"(*IR*) to represent the effects of a source code (for example, like LLVM). The benefits of using an IR are:\n",
"* A unified representation that does not depend on the source architecture;\n",
"* A minimal language;\n",
"* The side effects are explicit, for example *A + B* will not implicitly update flags.\n",
"\n",
"Miasm's IR implementation is located in `miasm.expression.expression`, with `Expr*` objects. Rules of the language will be enumerated along this documentation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: Each expression has a size (in bit). Some expressions need this size at creation time, others compute their size from their arguments.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Language"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic words in the language are:\n",
"* `ExprId` : represents an identifier. For example, the register `EAX` will be represented by an `ExprId` of size 32 bits.\n",
"* `ExprInt` : represents an unsigned integer."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a\n",
"ExprId('a', 32)\n",
"a\n",
"32\n"
]
}
],
"source": [
"from miasm.expression.expression import *\n",
"\n",
"a = ExprId(\"a\", 32)\n",
"print(a)\n",
"print(repr(a))\n",
"\n",
"# Identifier\n",
"print(a.name)\n",
"print(a.size)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0x10\n",
"0xFFFFFFFF\n",
"16\n"
]
}
],
"source": [
"cst1 = ExprInt(16, 32)\n",
"print(cst1)\n",
"cst2 = ExprInt(-1, 32)\n",
"print(cst2)\n",
"\n",
"# Show associated value\n",
"print(int(cst1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The word `ExprMem` represents a memory access, of a given size in bits."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@16[0x11223344]\n",
"0x11223344\n"
]
}
],
"source": [
"# Memory access of 16 bits, at address 0x11223344 on 32 bits\n",
"addr = ExprInt(0x11223344, 32)\n",
"mem1 = ExprMem(addr, 16)\n",
"print(mem1)\n",
"\n",
"# Show memory address\n",
"print(mem1.ptr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The word `ExprOp` describes the n-ary operations between expressions. The operation is a string, so new operations can be created on the fly. Some operations (`+`, `*`, `|`, `parity`, ...) are already used by Miasm. An operation occurs between elements having the same size and has the same size as the arguments."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a + 0x10\n",
"(ExprId('a', 32), ExprInt(0x10, 32))\n",
"MyCustomOp(a)\n"
]
}
],
"source": [
"# Defining an operation\n",
"op1 = ExprOp(\"+\", a, cst1)\n",
"print(op1)\n",
"\n",
"# Accessing the arguments\n",
"print(op1.args)\n",
"\n",
"# Creating a custom operation\n",
"op2 = ExprOp(\"MyCustomOp\", a)\n",
"print(op2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: In _most_ cases, the arguments of an ExprOp must have the same size. The ExprOp size is then the size of its arguments. Some exceptions exist, for example the operator \"```==```\" has a size of 1 bit. Size exceptions are described here: https://github.com/cea-sec/miasm/blob/5ee76eed19b62983909d6a16966acc15b2c8726f/miasm/expression/expression.py#L1045**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Helpers ease the creation of common operations."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a + 0x10\n",
"a * 0x10\n",
"-a\n",
"a | 0x10\n",
"a & 0x10\n"
]
}
],
"source": [
"print(a + cst1)\n",
"print(a * cst1)\n",
"print(- a)\n",
"print(a | cst1)\n",
"print(a & cst1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Be careful, even though the Expressions can represent \"everything\", Miasm assumes some properties on certain operations:\n",
"* the associative operations (`+`, `^`, `|`, ...) are n-ary operations ;\n",
"* the `-` is always unary"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a + -0x10\n"
]
}
],
"source": [
"print(a - cst1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `parity` has always a size 1, it's an exception"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"32\n",
"1\n"
]
}
],
"source": [
"p = ExprOp(\"parity\", a)\n",
"print(a.size)\n",
"print(p.size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `=` operation is handled separately by the word `ExprAssign`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a = 0x10\n",
"0x10\n",
"a\n"
]
}
],
"source": [
"assign = ExprAssign(a, cst1)\n",
"print(assign)\n",
"\n",
"# Source, destination\n",
"print(assign.src)\n",
"print(assign.dst)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: The ```ExprAssign``` is not really a word of the language: it can be considered as a statement. Thus, it cannot be an argument of another expression.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The word `ExprCond` represents a ternary relation, equivalend to the Python `src1 if cond else src2`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a?(0x10,0xFFFFFFFF)\n",
"a\n",
"0x10\n",
"0xFFFFFFFF\n"
]
}
],
"source": [
"cond = ExprCond(a, cst1, cst2)\n",
"print(cond)\n",
"\n",
"# Access to the elements\n",
"print(cond.cond)\n",
"print(cond.src1)\n",
"print(cond.src2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: The sizes of ```src1``` and ```src2``` must be equal, but can differ from the size of ```cond```.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following words manipulate the sizes:\n",
"* `ExprSlice`: extracts a bits slice of an expression;\n",
"* `ExprCompose`: concatenates two expressions."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a[6:8]\n",
"2\n",
"a\n",
"6\n",
"8\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sl = ExprSlice(a, 6, 8)\n",
"print(sl)\n",
"print(sl.size)\n",
"\n",
"# Access to the properties\n",
"print(sl.arg)\n",
"print(sl.start)\n",
"print(sl.stop)\n",
"\n",
"# Simpler form\n",
"sl == a[6:8]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: The size of an ```ExprSlice``` is equal to ```stop - start```.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{a 0 32, 0x10 32 64}\n",
"64\n",
"(ExprId('a', 32), ExprInt(0x10, 32))\n",
"[(0, ExprId('a', 32)), (32, ExprInt(0x10, 32))]\n"
]
}
],
"source": [
"# Concatenation of a (bit 0 to 31) with cst1 (bit 32 to 63)\n",
"comp = ExprCompose(a, cst1)\n",
"print(comp)\n",
"print(comp.size)\n",
"\n",
"# Access to the arguments\n",
"print(comp.args)\n",
"# Access to the starting bit and the associated argument\n",
"print(list(comp.iter_args()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Rule: The size of an ```ExprCompose``` is equal to the sum of the sizes of its arguments.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the word `ExprLoc` represents a memory location.\n",
"For example, it can represent the destination of a jump or a function call.\n",
"\n",
"A location is described by a unique element of type `LocKey`. You can see the `LocKey` as a key that you can use to retrieve all the information associated with a location: its offset, its name (\"main\") etc.\n",
"`ExprLoc` is a kind of `LocKey` container."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"loc_key_1\n"
]
}
],
"source": [
"loc = ExprLoc(LocKey(1), 32)\n",
"print(loc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In summary, the different words are:\n",
"\n",
"| Word | Meaning |\n",
"|-----|----------|\n",
"|ExprAssign|A=B|\n",
"|ExprInt|0x18|\n",
"|ExprId|EAX|\n",
"|ExprLoc|label_1|\n",
"|ExprCond|A ? B : C|\n",
"|ExprMem|@16[ESI]|\n",
"|ExprOp|A + B|\n",
"|ExprSlice|AH = EAX[8 :16]|\n",
"|ExprCompose|AX = AH.AL|"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Common helpers"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ExprInt(0xFFFFFFFF, 32)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Proper size mask\n",
"a.mask"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"32"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Expression size\n",
"a.size"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a 0x10\n"
]
}
],
"source": [
"# Printable version\n",
"print(a, cst1)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ExprId('a', 32) ExprOp('+', ExprId('a', 32), ExprInt(0x10, 32))\n"
]
}
],
"source": [
"# Expr representation (can be copy-pasted in the code)\n",
"print(repr(a), repr(a + cst1))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"zeroExt_64(0x10)\n",
"signExt_64(0x10)\n"
]
}
],
"source": [
"# Size extension (unsigned, signed)\n",
"print(cst1.zeroExtend(64))\n",
"print(cst1.signExtend(64))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a[31:32]\n"
]
}
],
"source": [
"# Most significant bit\n",
"print(a.msb())"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a + a + 0x10\n",
"0xFFFFFFFF + 0xFFFFFFFF + 0x10\n"
]
}
],
"source": [
"# Replacement\n",
"expr1 = a + a + cst1\n",
"print(expr1)\n",
"expr2 = expr1.replace_expr({a: cst2})\n",
"print(expr2)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n",
"False\n",
"True\n",
"True\n",
"True\n",
"False\n"
]
}
],
"source": [
"# Type test\n",
"print(a.is_id())\n",
"print(a.is_int())\n",
"print(cst1.is_int())\n",
"print(op1.is_op())\n",
"print(op1.is_op(\"+\"))\n",
"print(op1.is_op(\"&\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expression represented by a graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Miasm IR expressions have a recursive structure and can be represented and manipulated as graphs.\n",
"The graph object is a `DiGraph`, implemented in `miasm.core.graph`. It offers usual methods for manipulating graphs (node and vertex access, predecessors, successors, dominators, graphviz dot representation ...)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(a + 0x10) & 0xFFFFFFFF\n",
"(a + 0x10) & 0xFFFFFFFF\n",
"0x10\n",
"0xFFFFFFFF\n",
"a\n",
"a + 0x10\n",
"a + 0x10 -> a\n",
"a + 0x10 -> 0x10\n",
"(a + 0x10) & 0xFFFFFFFF -> a + 0x10\n",
"(a + 0x10) & 0xFFFFFFFF -> 0xFFFFFFFF\n"
]
}
],
"source": [
"expr3 = a + cst1 & cst2\n",
"print(expr3)\n",
"graph = expr3.graph()\n",
"print(graph)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"dot = graph.dot()\n",
"#from graphviz import Source\n",
"#src = Source(dot)\n",
"#src"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expression simplification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The expression simplification applies transformation rules to an expression until they can be applied. This process is done by an `ExpressionSimplifier` object, implemented in `miasm.expression.simplifications`.\n",
"\n",
"Some basic simplifications are already implemented and can be activated with the `expr_simp` instance in the module."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0x10 + 0xFFFFFFFF\n",
"0xF\n",
"0x10[4:5]\n",
"0x1\n",
"a + a + -a\n",
"a\n",
"a + 0x10\n",
"0x10 + 0x10\n",
"0x20\n",
"a * 0x4\n"
]
}
],
"source": [
"from miasm.expression.simplifications import expr_simp\n",
"\n",
"# 0x10 + (-1) = 0xF\n",
"op3 = cst1 + cst2\n",
"print(op3)\n",
"cst3 = expr_simp(op3)\n",
"print(cst3)\n",
"\n",
"# 5th bit of 0x10 = 1\n",
"sl2 = cst1[4:5]\n",
"print(sl2)\n",
"cst4 = expr_simp(sl2)\n",
"print(cst4)\n",
"\n",
"# a + a - a = a\n",
"op4 = a + a - a\n",
"print(op4)\n",
"print(expr_simp(op4))\n",
"assert expr_simp(op4) == a\n",
"\n",
"# Use to evaluate an expression (here a + 0x10 is evaluated with a = 0x10)\n",
"print(op1)\n",
"print(op1.replace_expr({a: cst1}))\n",
"print(expr_simp(op1.replace_expr({a: cst1})))\n",
"print(expr_simp(a + a +a + a))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transformation rules can be added with the method `enable_passes`. A transformation rule is a function and is associated to an expression type.\n",
"\n",
"Below, the code transforms the booleans operation in an `ExprCond` to an operation of type `<`."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"((x + -y) ^ ((x ^ y) & ((x + -y) ^ x)))[31:32]\n",
"True\n",
"False\n",
"True\n"
]
}
],
"source": [
"x = ExprId(\"x\", 32)\n",
"y = ExprId(\"y\", 32)\n",
"\n",
"inf_signed = ((x - y) ^ ((x ^ y) & ((x - y) ^ x)))[31:32]\n",
"print(inf_signed)\n",
"\n",
"def is_inf(x_val, y_val):\n",
" new_val = expr_simp(inf_signed.replace_expr({\n",
" x: x_val,\n",
" y: y_val,\n",
" }))\n",
" assert new_val.is_int()\n",
" return int(new_val) == 1\n",
"\n",
"# 0 < 10\n",
"print(is_inf(ExprInt(0, 32), ExprInt(10, 32)))\n",
"# 10 !< 10\n",
"print(is_inf(ExprInt(10, 32), ExprInt(10, 32)))\n",
"# -1 < 0\n",
"print(is_inf(ExprInt(0xFFFFFFFF, 32), ExprInt(0, 32)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following code enables this transformation, which was already implemented but not enabled: "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{<class 'miasm.expression.expression.ExprCond'>: [<function expr_simp_equal at 0x7f4dd825daf0>],\n",
" <class 'miasm.expression.expression.ExprOp'>: [<function expr_simp_inverse at 0x7f4dd825da60>],\n",
" <class 'miasm.expression.expression.ExprSlice'>: [<function expr_simp_inf_signed at 0x7f4dd825d940>,\n",
" <function expr_simp_inf_unsigned_inversed at 0x7f4dd825d9d0>]}\n",
"(((x ^ y) & (x ^ (x + -y))) ^ (x + -y))[31:32]\n",
"x <s y\n"
]
}
],
"source": [
"from pprint import pprint as pp\n",
"from miasm.expression.simplifications import ExpressionSimplifier\n",
"pp(ExpressionSimplifier.PASS_COND)\n",
"\n",
"print(expr_simp(inf_signed))\n",
"expr_simp_cond = ExpressionSimplifier()\n",
"expr_simp_cond.enable_passes(ExpressionSimplifier.PASS_COND)\n",
"print(expr_simp_cond(inf_signed))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Going further"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some functionalities relative to expressions but not detailed here:\n",
"* `SymbolicExecutionEngine` : Symbolic execution;\n",
"* `Translators` : expression translation to C, Python, \"Miasm like\" or z3 expressions;\n",
"* `expr_range` : Range analysis of an expression possible values;\n",
"* `AssignBlock`, `IRBlock`, `DiGraphDefUse`, `dead_simp`, ... : grouping expressions to describe a full program, associated analysis;\n",
"* `miasm.arch.*.sem.py`, `SemBuilder` : architecture specific semantic descriptions, i.e. all the side effects of a mnemonic."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}