examples/1. Introductory tutorial.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`BpForms` is a toolkit for unambiguously describing the primary sequence of biopolymers such as DNA, RNA, and proteins, including modified DNA, RNA, and proteins. BpForms represents biopolymers as monomeric forms linked.\n",
"This tutorial illustrates how to use the `BpForms` Python library. Please see the second tutorial for more details and more examples. Please also see the [documentation](https://docs.karrlab.org/bpforms/) for more information about the `BpForms` grammar and more instructions for using the `BpForms` website, JSON REST API, and command line interface."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Import library"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import bpforms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create polymers from their string representations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form of a DNA"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"dna_1 = bpforms.DnaForm().from_str('ACGT | circular')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form an RNA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"rna_1 = bpforms.RnaForm().from_str('C{01A}GU')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form of a protein"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"prot_1 = bpforms.ProteinForm().from_str(\n",
" 'CVYT{U}C | x-link: [type: \"disulfide\"'\n",
" ' | l: 1 | r: 6]')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create the same polymers programmatically"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form of a DNA"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"dna_2 = bpforms.DnaForm()\n",
"for residue in ['A', 'C', 'G', 'T']:\n",
" dna_2.seq.append(bpforms.dna_alphabet.monomers[residue])\n",
"dna_2.circular = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form of an RNA"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"rna_2 = bpforms.RnaForm()\n",
"for residue in ['C', '01A', 'G', 'U']:\n",
" rna_2.seq.append(bpforms.rna_alphabet.monomers[residue])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Form of a protein"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"prot_2 = bpforms.ProteinForm()\n",
"for residue in ['C', 'V', 'Y', 'T', 'U', 'C']:\n",
" prot_2.seq.append(bpforms.protein_alphabet.monomers[residue])\n",
"prot_2.crosslinks.add(bpforms.OntoBond(\n",
" type=bpforms.xlink.crosslinks_onto['disulfide'],\n",
" l_monomer=1, r_monomer=6))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get properties of polymers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Circularity"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_1.circular"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Residue sequence"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<bpforms.core.Monomer at 0x7f74e9252910>,\n",
" <bpforms.core.Monomer at 0x7f752e3e7cd0>,\n",
" <bpforms.core.Monomer at 0x7f74e92572d0>,\n",
" <bpforms.core.Monomer at 0x7f74e92610d0>]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rna_1.seq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Crosslinks"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{<bpforms.core.OntoBond at 0x7f74e6820910>}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prot_1.crosslinks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## String representation of a polymer"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'ACGT | circular'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str(dna_2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Check the equality of polymers"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_2.is_equal(dna_1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Calculate the properties of a polymer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Atomic structure"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(<openbabel.OBMol; proxy of <Swig Object of type 'OpenBabel::OBMol *' at 0x7f752ebcb540> >,\n",
" {1: {'monomer': {1: 1,\n",
" 2: 2,\n",
" 3: 3,\n",
" 4: 4,\n",
" 5: 5,\n",
" 6: 6,\n",
" 7: 7,\n",
" 8: 8,\n",
" 9: 9,\n",
" 10: 10,\n",
" 11: 11,\n",
" 13: 12,\n",
" 14: 13,\n",
" 15: 14,\n",
" 16: 15,\n",
" 17: 16,\n",
" 18: 17,\n",
" 19: 18,\n",
" 20: 19,\n",
" 21: 20,\n",
" 22: 21},\n",
" 'backbone': {}},\n",
" 2: {'monomer': {1: 22,\n",
" 2: 23,\n",
" 3: 24,\n",
" 4: 25,\n",
" 5: 26,\n",
" 6: 27,\n",
" 7: 28,\n",
" 8: 29,\n",
" 9: 30,\n",
" 10: 31,\n",
" 11: 32,\n",
" 13: 33,\n",
" 14: 34,\n",
" 15: 35,\n",
" 16: 36,\n",
" 17: 37,\n",
" 18: 38,\n",
" 19: 39,\n",
" 20: 40},\n",
" 'backbone': {}},\n",
" 3: {'monomer': {1: 41,\n",
" 2: 42,\n",
" 3: 43,\n",
" 4: 44,\n",
" 5: 45,\n",
" 6: 46,\n",
" 7: 47,\n",
" 8: 48,\n",
" 9: 49,\n",
" 10: 50,\n",
" 11: 51,\n",
" 13: 52,\n",
" 14: 53,\n",
" 15: 54,\n",
" 16: 55,\n",
" 17: 56,\n",
" 18: 57,\n",
" 19: 58,\n",
" 20: 59,\n",
" 21: 60,\n",
" 22: 61,\n",
" 23: 62,\n",
" 24: 63},\n",
" 'backbone': {}},\n",
" 4: {'monomer': {1: 64,\n",
" 2: 65,\n",
" 3: 66,\n",
" 4: 67,\n",
" 5: 68,\n",
" 6: 69,\n",
" 7: 70,\n",
" 8: 71,\n",
" 9: 72,\n",
" 10: 73,\n",
" 11: 74,\n",
" 13: 75,\n",
" 14: 76,\n",
" 15: 77,\n",
" 16: 78,\n",
" 17: 79,\n",
" 18: 80,\n",
" 19: 81,\n",
" 20: 82,\n",
" 21: 83,\n",
" 22: 84},\n",
" 'backbone': {}}})"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_1.get_structure()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SMILES representation of the structure"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'O1C2CC(OC2COP(=O)([O-])OC2CC(OC2COP(=O)(OC2CC(OC2COP(=O)(OC2CC(OC2COP1(=O)[O-])n1ccc(nc1=O)N)[O-])n1cnc2c1nc(N)[nH]c2=O)[O-])n1cc(C)c(=O)[nH]c1=O)n1cnc2c1ncnc2N'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_1.export('smiles') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Formula"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AttrDefault(<class 'float'>, False, {'C': 39.0, 'H': 45.0, 'N': 15.0, 'O': 24.0, 'P': 4.0})"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_1.get_formula()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Charge"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-4"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dna_1.get_charge()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unmodified/canonical sequence"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'CAGU'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rna_1.get_canonical_seq() "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}