python-security/pyt

View on GitHub
pyt/cfg/README.rst

Summary

Maintainability
Test Coverage
`make_cfg`_ is what `__main__.py`_ calls, it takes the `Abstract Syntax Tree`_, creates an `ExprVisitor`_ and returns a `Control Flow Graph`_.

.. _make_cfg: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/make_cfg.py#L22-L38
.. _\_\_main\_\_.py: https://github.com/python-security/pyt/blob/re_organize_code/pyt/__main__.py#L33-L106
.. _Abstract Syntax Tree: https://en.wikipedia.org/wiki/Abstract_syntax_tree
.. _Control Flow Graph: https://en.wikipedia.org/wiki/Control_flow_graph

`stmt_visitor.py`_ and `expr_visitor.py`_ mirror the `abstract grammar`_ of Python. Statements can contain expressions, but not the other way around. This is why `ExprVisitor`_ inherits from `StmtVisitor`_, which inherits from `ast.NodeVisitor`_; from the standard library.

.. _StmtVisitor: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/stmt_visitor.py#L55
.. _ExprVisitor: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/expr_visitor.py#L33

This is how `ast.NodeVisitor`_ works:

.. code-block:: python

  def visit(self, node):
    """Visit a node."""
    method = 'visit_' + node.__class__.__name__
    visitor = getattr(self, method, self.generic_visit)
    return visitor(node)


So as you'll see, there is a `visit\_` function for almost every AST node type. We keep track of all the nodes while we visit by adding them to self.nodes, connecting them via `ingoing and outgoing node attributes`_.

.. _ingoing and outgoing node attributes: https://github.com/python-security/pyt/blob/re_organize_code/pyt/core/node_types.py#L27-L48

The two most illustrative functions are `stmt_star_handler`_ and expr_star_handler. expr_star_handler has not been merged to master so let's talk about `stmt_star_handler`_.


Handling an if: statement 
=========================

Example code

.. code-block:: python

  if some_condition:
      x = 5

This is the relevant part of the `abstract grammar`_

.. code-block:: python

  If(expr test, stmt* body, stmt* orelse)
  # Note: stmt* means any number of statements. 


Upon visiting an if: statement we will enter `visit_If`_ in `stmt_visitor.py`_. Since we know that the test is just one expression, we can just call self.visit() on it. The body could be an infinite number of statements, so we use the `stmt_star_handler`_ function.

`stmt_star_handler`_ returns a namedtuple (`ConnectStatements`_) with the first statement, last_statements and break_statements of all of the statements that were in the body of the node. `stmt_star_handler`_ takes care of connecting each statement in the body to the next one.

We then connect the test node to the first node in the body (if some_condition -> x = 5) and return a namedtuple (`ControlFlowNode`_) with the test, last_statements and break_statements.


.. _visit\_If: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/stmt_visitor.py#L208-L232

.. _ConnectStatements: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/stmt_visitor_helper.py#L15

.. _ControlFlowNode: https://github.com/python-security/pyt/blob/re_organize_code/pyt/core/node_types.py#L7

.. _stmt\_visitor.py: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/stmt_visitor.py

.. _expr\_visitor.py: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/expr_visitor.py

.. _stmt_star_handler: https://github.com/python-security/pyt/blob/re_organize_code/pyt/cfg/stmt_visitor.py#L60-L121


.. code-block:: python

  def visit_If(self, node):
      test = self.append_node(IfNode(
          node.test,
          node,
          path=self.filenames[-1]
      ))

      body_connect_stmts = self.stmt_star_handler(node.body)
      # ...
      test.connect(body_connect_stmts.first_statement)

      if node.orelse:
          # ...
      else:
          # if there is no orelse, test needs an edge to the next_node
          body_connect_stmts.last_statements.append(test)

      last_statements = remove_breaks(body_connect_stmts.last_statements)

      return ControlFlowNode(
          test,
          last_statements,
          break_statements=body_connect_stmts.break_statements
      )


.. code-block:: python

  def stmt_star_handler(
      self,
      stmts
  ):
      """Handle stmt* expressions in an AST node.
      Links all statements together in a list of statements.
      Accounts for statements with multiple last nodes.
      """
      break_nodes = list()
      cfg_statements = list()

      first_node = None
      node_not_to_step_past = self.nodes[-1]

      for stmt in stmts:
          node = self.visit(stmt)

          if isinstance(node, ControlFlowNode):
              break_nodes.extend(node.break_statements)
          elif isinstance(node, BreakNode):
              break_nodes.append(node)

          cfg_statements.append(node)
          if not first_node:
              if isinstance(node, ControlFlowNode):
                  first_node = node.test
              else:
                  first_node = get_first_node(
                      node,
                      node_not_to_step_past
                  )

      connect_nodes(cfg_statements)

      if first_node:
          first_statement = first_node
      else:
          first_statement = get_first_statement(cfg_statements[0])

      last_statements = get_last_statements(cfg_statements)

      return ConnectStatements(
          first_statement=first_statement,
          last_statements=last_statements,
          break_statements=break_nodes
      )


.. _ast.NodeVisitor: https://docs.python.org/3/library/ast.html#ast.NodeVisitor
.. _abstract grammar: https://docs.python.org/3/library/ast.html#abstract-grammar

References
==========

For more information on AST nodes, see the `Green Tree Snakes`_ documentation.

.. _Green Tree Snakes: https://greentreesnakes.readthedocs.io/en/latest/nodes.html