KarrLab/bpforms

View on GitHub
examples/1. Introductory tutorial.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`BpForms` is a toolkit for unambiguously describing the primary sequence of biopolymers such as DNA, RNA, and proteins, including modified DNA, RNA, and proteins. BpForms represents biopolymers as monomeric forms linked.\n",
    "This tutorial illustrates how to use the `BpForms` Python library. Please see the second tutorial for more details and more examples. Please also see the [documentation](https://docs.karrlab.org/bpforms/) for more information about the `BpForms` grammar and more instructions for using the `BpForms` website, JSON REST API, and command line interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import bpforms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create polymers from their string representations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form of a DNA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "dna_1 = bpforms.DnaForm().from_str('ACGT | circular')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form an RNA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "rna_1 = bpforms.RnaForm().from_str('C{01A}GU')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form of a protein"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "prot_1 = bpforms.ProteinForm().from_str(\n",
    "            'CVYT{U}C | x-link: [type: \"disulfide\"'\n",
    "            ' | l: 1 | r: 6]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the same polymers programmatically"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form of a DNA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "dna_2 = bpforms.DnaForm()\n",
    "for residue in ['A', 'C', 'G', 'T']:\n",
    "    dna_2.seq.append(bpforms.dna_alphabet.monomers[residue])\n",
    "dna_2.circular = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form of an RNA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "rna_2 = bpforms.RnaForm()\n",
    "for residue in ['C', '01A', 'G', 'U']:\n",
    "    rna_2.seq.append(bpforms.rna_alphabet.monomers[residue])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Form of a protein"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "prot_2 = bpforms.ProteinForm()\n",
    "for residue in ['C', 'V', 'Y', 'T', 'U', 'C']:\n",
    "    prot_2.seq.append(bpforms.protein_alphabet.monomers[residue])\n",
    "prot_2.crosslinks.add(bpforms.OntoBond(\n",
    "    type=bpforms.xlink.crosslinks_onto['disulfide'],\n",
    "    l_monomer=1, r_monomer=6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get properties of polymers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Circularity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_1.circular"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Residue sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<bpforms.core.Monomer at 0x7f74e9252910>,\n",
       " <bpforms.core.Monomer at 0x7f752e3e7cd0>,\n",
       " <bpforms.core.Monomer at 0x7f74e92572d0>,\n",
       " <bpforms.core.Monomer at 0x7f74e92610d0>]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rna_1.seq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crosslinks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{<bpforms.core.OntoBond at 0x7f74e6820910>}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prot_1.crosslinks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## String representation of a polymer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ACGT | circular'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "str(dna_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Check the equality of polymers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_2.is_equal(dna_1) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Calculate the properties of a polymer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Atomic structure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(<openbabel.OBMol; proxy of <Swig Object of type 'OpenBabel::OBMol *' at 0x7f752ebcb540> >,\n",
       " {1: {'monomer': {1: 1,\n",
       "    2: 2,\n",
       "    3: 3,\n",
       "    4: 4,\n",
       "    5: 5,\n",
       "    6: 6,\n",
       "    7: 7,\n",
       "    8: 8,\n",
       "    9: 9,\n",
       "    10: 10,\n",
       "    11: 11,\n",
       "    13: 12,\n",
       "    14: 13,\n",
       "    15: 14,\n",
       "    16: 15,\n",
       "    17: 16,\n",
       "    18: 17,\n",
       "    19: 18,\n",
       "    20: 19,\n",
       "    21: 20,\n",
       "    22: 21},\n",
       "   'backbone': {}},\n",
       "  2: {'monomer': {1: 22,\n",
       "    2: 23,\n",
       "    3: 24,\n",
       "    4: 25,\n",
       "    5: 26,\n",
       "    6: 27,\n",
       "    7: 28,\n",
       "    8: 29,\n",
       "    9: 30,\n",
       "    10: 31,\n",
       "    11: 32,\n",
       "    13: 33,\n",
       "    14: 34,\n",
       "    15: 35,\n",
       "    16: 36,\n",
       "    17: 37,\n",
       "    18: 38,\n",
       "    19: 39,\n",
       "    20: 40},\n",
       "   'backbone': {}},\n",
       "  3: {'monomer': {1: 41,\n",
       "    2: 42,\n",
       "    3: 43,\n",
       "    4: 44,\n",
       "    5: 45,\n",
       "    6: 46,\n",
       "    7: 47,\n",
       "    8: 48,\n",
       "    9: 49,\n",
       "    10: 50,\n",
       "    11: 51,\n",
       "    13: 52,\n",
       "    14: 53,\n",
       "    15: 54,\n",
       "    16: 55,\n",
       "    17: 56,\n",
       "    18: 57,\n",
       "    19: 58,\n",
       "    20: 59,\n",
       "    21: 60,\n",
       "    22: 61,\n",
       "    23: 62,\n",
       "    24: 63},\n",
       "   'backbone': {}},\n",
       "  4: {'monomer': {1: 64,\n",
       "    2: 65,\n",
       "    3: 66,\n",
       "    4: 67,\n",
       "    5: 68,\n",
       "    6: 69,\n",
       "    7: 70,\n",
       "    8: 71,\n",
       "    9: 72,\n",
       "    10: 73,\n",
       "    11: 74,\n",
       "    13: 75,\n",
       "    14: 76,\n",
       "    15: 77,\n",
       "    16: 78,\n",
       "    17: 79,\n",
       "    18: 80,\n",
       "    19: 81,\n",
       "    20: 82,\n",
       "    21: 83,\n",
       "    22: 84},\n",
       "   'backbone': {}}})"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_1.get_structure()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SMILES representation of the structure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'O1C2CC(OC2COP(=O)([O-])OC2CC(OC2COP(=O)(OC2CC(OC2COP(=O)(OC2CC(OC2COP1(=O)[O-])n1ccc(nc1=O)N)[O-])n1cnc2c1nc(N)[nH]c2=O)[O-])n1cc(C)c(=O)[nH]c1=O)n1cnc2c1ncnc2N'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_1.export('smiles') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Formula"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AttrDefault(<class 'float'>, False, {'C': 39.0, 'H': 45.0, 'N': 15.0, 'O': 24.0, 'P': 4.0})"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_1.get_formula()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Charge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-4"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dna_1.get_charge()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unmodified/canonical sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'CAGU'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rna_1.get_canonical_seq() "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}