colab-notebooks/colab_fragmenstein.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_FoJBuFPCWqw"
},
"source": [
"# Fragmenstein\n",
"Fragmenstein is a position-based fragment-merging python3 tool.\n",
"\n",
"<img src=\"https://github.com/matteoferla/Fragmenstein/raw/master/images/overview.png\" width=\"800\" alt=\"logo\">\n",
"\n",
"In its merging/linking operation, under the coordination of the class Victor,\n",
"the class Monster finds spatially overlapping atoms and stitches them together (with RDKit),\n",
"then the class Igor reanimates (minimises in PyRosetta) them within the protein site restraining the atoms to original positions.\n",
"As this compound may not be purchasable, one can use the placement operation to\n",
"make a stitched-together molecule based a template."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-z25QZ4mV7-M"
},
"source": [
"# Further details"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_ou9a06GV7-N"
},
"source": [
"## Details\n",
"\n",
"This notebook does sevaral operations as examples.\n",
"\n",
"* It optionally minimises the template structure, \n",
"* optionally extracts the hits from provided PDB structures.\n",
"* It combines combinatorially the provided hits\n",
"* It then searches for the most similar molecules to the user chosen molecule in the Enamine Real database (via the API of John Irwin's SmallWorld server)\n",
"* places them.\n",
"\n",
"NB Whereas Fragmenstein can deal with covalent ligands\n",
"and can interconvert a few cysteine reactive warheads\n",
"this notebook does not do any of that due to corner case mayhem. The demo data (MPro from the [Covid Moonshot](https://covid.postera.ai/covid) available from [Fragalysis](https://fragalysis.diamond.ac.uk/)) will use covalent residues.\n",
"\n",
"Fragmenstein can _partially_ work without PyRosetta, but this notebook does not work that way.\n",
"\n",
"This notebook from https://github.com/matteoferla/Fragmenstein.\n",
"\n",
"It is meant to be opened in Colabs via [https://colab.research.google.com/github/matteoferla/Fragmenstein/blob/master/colab-notebooks/colab_fragmenstein.ipynb](https://colab.research.google.com/github/matteoferla/Fragmenstein/blob/master/colab-notebooks/colab_fragmenstein.ipynb)\n",
"\n",
"See also:\n",
"\n",
"* https://github.com/matteoferla/Fragmenstein\n",
"* https://fragmenstein.readthedocs.io\n",
"* https://github.com/matteoferla/Python_SmallWorld_API\n",
"* https://github.com/matteoferla/pyrosetta_help"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "idggAKzIV7-O"
},
"source": [
"# Installation and initialisation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "ieOQnFAkCWq1"
},
"outputs": [],
"source": [
"#@title Installation\n",
"#@markdown Press the play button on the top right hand side of this cell\n",
"#@markdown once you have checked the settings.\n",
"#@markdown You will be notified that this notebook is not from Google, that is normal.\n",
"\n",
"#@markdown Send error messages to errors.matteoferla.com for logging?\n",
"#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)\n",
"report_errors = False #@param {type:\"boolean\"}\n",
"if report_errors:\n",
" !pip install notebook-error-reporter\n",
" from notebook_error_reporter import ErrorServer\n",
"\n",
" es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')\n",
" es.enable()\n",
"\n",
"\n",
"!pip install -q py3Dmol rdkit rdkit-to-params biopython pyrosetta-help jsme_notebook\n",
"!pip install -q --upgrade plotly\n",
"\n",
"\n",
"import os\n",
"import py3Dmol\n",
"from importlib import reload\n",
"import pyrosetta_help as ph\n",
"from typing import *\n",
"\n",
"# Muppet-proofing: are we in colab?\n",
"assert ph.get_shell_mode() == 'colab', 'This is a colab notebook, if running in Jupyter notebook, the installation is different'\n",
"\n",
"# Colab still runs on 3.7\n",
"# hack to enable the backport:\n",
"import sys\n",
"if sys.version_info.major != 3 or sys.version_info.minor < 8:\n",
" !pip install -q singledispatchmethod\n",
" import functools\n",
" from singledispatchmethod import singledispatchmethod # noqa it's okay, PyCharm, I am not a technoluddite\n",
" functools.singledispatchmethod = singledispatchmethod\n",
" !pip install -q typing_extensions\n",
" import typing_extensions\n",
" import typing\n",
" for key, fun in typing_extensions.__dict__.items():\n",
" if key not in typing.__dict__:\n",
" setattr(typing, key, fun)\n",
"\n",
"# ============================================================================\n",
"#@markdown ### Use Google Drive\n",
"#@markdown Optionally store your results in your google drive.\n",
"#@markdown If `use_drive` is True, you will be prompted to give permission to use Google Drive\n",
"#@markdown (you may be prompted to follow a link and possibly authenticate and then copy a code into a box)\n",
"#@markdown —**_always_ remember to check strangers' code against data theft**:\n",
"#@markdown e.g. search and look for all instances of `http`, `requests` and `post` in the code, and\n",
"#@markdown make sure the creator is not typosquatting as someone else (e.g. username `Coogle`).\n",
"use_drive = True #@param {type:\"boolean\"}\n",
"if use_drive:\n",
" ph.mount_google_drive(use_drive)\n",
"\n",
"# ============================================================================\n",
"#@markdown ### PyRosetta installation\n",
"#@markdown This will install PyRosetta in this Colab notebook (~2–15 minutes depending on time of day),\n",
"#@markdown but you will require a [PyRosetta licence](https://www.pyrosetta.org/home/licensing-pyrosetta)\n",
"#@markdown (free for academics).\n",
"#@markdown to speed things up _next_ time you can download a release into your Google Drive.\n",
"#@markdown Use Google Drive for PyRosetta:\n",
"\n",
"#@markdown 👍 maybe faster next time\n",
"\n",
"#@markdown 👎 occupies some 10 GB, so you'll need to be on the 100 GB plan of Google Drive (it's one pound a month).\n",
"\n",
"download_to_drive = False #@param {type:\"boolean\"}\n",
"download_path = '.' #@param {type:\"string\"}\n",
"#@markdown If this is the next-time, `download_to_drive` and the credentials below will be ignored if\n",
"#@markdown there's a release in `download_path`.\n",
"\n",
"#@markdown The following is not the real username and password. However, the format is similar.\n",
"username = 'boltzmann' #@param {type:\"string\"}\n",
"username.strip().lower()\n",
"password = 'constant' #@param {type:\"string\"}\n",
"#@markdown If yours are not the academic credentials\n",
"#@markdown disable this:\n",
"hash_comparison_required = True #@param {type:\"boolean\"}\n",
"#@markdown 💡 THIS FLAG IS NOT PREVENTING YOU FROM USING PLAIN ROSETTA CREDENTIALS\n",
"#@markdown AS THE CUSTOM ERROR SAYS! **REGULAR ROSETTA CREDENTIALS DO NOT WORK FOR PYROSETTA.**\n",
"\n",
"if download_to_drive and not use_drive:\n",
" raise ValueError('You said False to `use_drive` and True to `download_to_drive`? Very funny.')\n",
"elif download_to_drive:\n",
" ph.download_pyrosetta(username=username,\n",
" password=password,\n",
" path=download_path,\n",
" hash_comparison_required=hash_comparison_required)\n",
"else:\n",
" pass\n",
"\n",
"ph.install_pyrosetta(username=username,\n",
" password=password,\n",
" path=download_path,\n",
" hash_comparison_required=hash_comparison_required)\n",
"reload(ph)\n",
"\n",
"# ??????? NOTE ??????????????????????????????????????????????????????????????????????\n",
"# ? Note to code spies\n",
"# ? this is a convoluted way to install pyrosetta via pyrosetta_help, due to options.\n",
"# ? the quicker way is:\n",
"# ?\n",
"# ? >>> pip install pyrosetta-help\n",
"# ? >>> install_pyrosetta -u xxx -p xxx\n",
"# ??????????????????????????????????????????????????????????????????????????????????\n",
"\n",
"from google.colab import output # noqa (It's a colaboratory specific repo)\n",
"output.enable_custom_widget_manager()\n",
"\n",
"\n",
"\n",
"# refresh imports\n",
"import site\n",
"site.main()\n",
"\n",
"# ================================================================\n",
"# ================================================================\n",
"# Fragmenstein\n",
"# custom for this notebook\n",
"!pip install smallworld_api fragmenstein\n",
"# Hey, future me, debugging are we? try:\n",
"# os.system('pip install git+https://github.com/matteoferla/Fragmenstein.git@dev-teo3')\n",
"\n",
"# make folders in _path\n",
"working_folder = 'fragmenstein_data'\n",
"input_folder = 'input'\n",
"output_folder = 'output'\n",
"if not os.path.exists(working_folder):\n",
" os.mkdir(working_folder)\n",
"os.chdir('fragmenstein_data')\n",
"for folder in (input_folder, output_folder):\n",
" if not os.path.exists(folder):\n",
" os.mkdir(folder)\n",
"\n",
"# ================================================================\n",
"# 3D viewer\n",
"\n",
"from typing import *\n",
"from rdkit import Chem\n",
"import py3Dmol\n",
"\n",
"\n",
"def stylize(representation: str, color: str) -> Dict[str, Dict[str, Dict[str, str]]]:\n",
" if 'carbon' in color.lower():\n",
" return dict(style={representation: {'colorscheme': color}})\n",
" else:\n",
" return dict(style={representation: {'color': color}})\n",
"\n",
"\n",
"def make_3Dview(template_pdbblock, colormols: Dict[str, List[Chem.Mol]]) -> py3Dmol.view:\n",
" \"\"\"\n",
" colormols is a diction of color/colorscheme to list of mols.\n",
" \"\"\"\n",
" view = py3Dmol.view(js=\"https://3dmol.org/build/3Dmol.js\")\n",
" view.addModel(template_pdbblock, \"pdb\", stylize('cartoon', 'gainsboro'))\n",
" view.setStyle(dict(hetflag=True), stylize('stick', 'whiteCarbon'))\n",
" for color, mols in colormols.items():\n",
" for mol in mols:\n",
" view.addModel(Chem.MolToMolBlock(mol), \"mol\", stylize('stick', color))\n",
" view.zoomTo(dict(hetflag=True))\n",
" return view\n",
"\n",
"\n",
"# ================================================================\n",
"# mol grid\n",
"\n",
"from rdkit.Chem import Draw\n",
"from rdkit.Chem import AllChem\n",
"from IPython.display import display\n",
"\n",
"\n",
"def display_mols(mols: Sequence[Chem.Mol],\n",
" molsPerRow=5,\n",
" subImgSize=(150,150),\n",
" useSVG=True):\n",
" \"\"\"\n",
" Rudimentary wrapper for calling ``display(Draw.MolsToGridImage``\n",
" \"\"\"\n",
" flattos = [AllChem.RemoveHs(mol) for mol in mols if isinstance(mol, Chem.Mol)]\n",
" for mol in flattos:\n",
" AllChem.Compute2DCoords(mol)\n",
" display(Draw.MolsToGridImage(flattos,\n",
" legends=[mol.GetProp('_Name') if mol.HasProp('_Name') else '-' for mol in mols],\n",
" subImgSize=subImgSize, useSVG=useSVG,\n",
" molsPerRow=molsPerRow))\n",
"\n",
"\n",
"# ================================================================"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "jkM21vbRCWq5"
},
"outputs": [],
"source": [
"#@title Start PyRosetta\n",
"#@markdown Leave alone and just run it. \n",
"#@markdown Only one that you might want to change is `ignore_waters`\n",
"#@markdown as merging a ligand with its watershell may be a reasonable thing to do\n",
"#@markdown in extreme circumstances —like not even Chuck Norris could find a followup.\n",
"import pyrosetta, logging\n",
"import pyrosetta_help as ph\n",
"\n",
"#@markdown Do not optimise hydrogen on loading:\n",
"no_optH = False #@param {type:\"boolean\"}\n",
"#@markdown Ignore (True) or raise error (False) if novel residue (e.g. ligand) — **don't tick this**.\n",
"ignore_unrecognized_res = False #@param {type:\"boolean\"}\n",
"#@markdown Use autogenerated PDB residues are often weird (bad geometry, wrong match, protonated etc.): —best do it properly and parameterise it, so **don't tick this**.\n",
"load_PDB_components = False #@param {type:\"boolean\"}\n",
"#@markdown Ignore all waters:\n",
"ignore_waters = True #@param {type:\"boolean\"}\n",
"\n",
"extra_options = ph.make_option_string(no_optH=no_optH,\n",
" ex1=None,\n",
" ex2=None,\n",
" mute='all',\n",
" ignore_unrecognized_res=ignore_unrecognized_res,\n",
" load_PDB_components=load_PDB_components,\n",
" ignore_waters=ignore_waters)\n",
"\n",
"# capture to log\n",
"# circuitous as I needed to debug...\n",
"logger = ph.configure_logger()\n",
"logger.handlers[0].setLevel(logging.WARNING) # logging.WARNING = 30\n",
"pyrosetta.init(extra_options=extra_options,\n",
" #set_logging_handler=True\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LQJXJETfV7-Y"
},
"source": [
"# Uploads"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "iGekHmHTCWq5"
},
"outputs": [],
"source": [
"#@title Upload Template model\n",
"#@markdown Upload PDB file of the template model used for placing the merger.\n",
"#@markdown Suggestion: use a model bound to the native ligand or similar\n",
"#@markdown as the side chains will be poised for action!\n",
"#@markdown NB. this PDB and those of the hits need to be superposed.\n",
"\n",
"#@markdown Alternatively tick here for demo values (SARS-CoV-2 MPro):\n",
"\n",
"use_mpro = False #@param {type:\"boolean\"}\n",
"\n",
"if use_mpro:\n",
" from fragmenstein import mpro_data\n",
"\n",
" pdbblock = mpro_data.get_template()\n",
"else:\n",
" from google.colab import files # noqa\n",
"\n",
" uploaded = files.upload()\n",
" assert len(uploaded) == 1, 'Upload only one template. Hits later.'\n",
" pdbblock = list(uploaded.values())[0]\n",
" if isinstance(pdbblock, bytes): # dont recall what format are txt sent as...\n",
" pdbblock = pdbblock.decode('utf8') # noqa its bytes if it tested so"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "sNEf_L1ICWq8"
},
"outputs": [],
"source": [
"#@title Upload hits or bound hits\n",
"#@markdown As the decision is based on extension, please no random extensions...\n",
"#@markdown `mol.try2.mol` is fine, but `mol.molecule` or `mol.mol.txt` are not.\n",
"\n",
"#@markdown ### Option 1.\n",
"#@markdown If you have a single multientry sdf file, upload that.\n",
"\n",
"#@markdown ### Option 2.\n",
"#@markdown If you have multiple mol files, upload those.\n",
"#@markdown RDKit handles mdl-mol files better than mol2 files.\n",
"\n",
"#@markdown ### Option 3.\n",
"#@markdown If you require your ligand to be extracted from pdb files\n",
"#@markdown upload those and specify the three-letter code of the ligand\n",
"#@markdown (this is the 3-letter name within the provided PDB(s) not the intended name for the merger!):\n",
"ligand_residue_name = 'LIG' #@param {type:\"string\"}\n",
"#@markdown This latter option is über-recommended if you have a covalently bound ligand\n",
"#@markdown but your extracted mol files are not or lack bond order.\n",
"#@markdown Note that you'll have to also upload a SMILES file (`.smi`)\n",
"#@markdown —this is just a tab separated table of SMILES tab name.\n",
"\n",
"#@markdown If your PDB lacks `CONECT` entries (you monster), tick this:\n",
"proximityBonding = False #@param {type:\"boolean\"}\n",
"\n",
"#@markdown **Alternative** if to use demo data from covid moonshoot check use next cell.\n",
"\n",
"# Go!\n",
"import os\n",
"from google.colab import files # noqa\n",
"from rdkit import Chem\n",
"from typing import *\n",
"from warnings import warn\n",
"\n",
"# ======================================\n",
"# get\n",
"uploaded = files.upload()\n",
"# sort\n",
"uploaded_split = {k: [] for k in ('smi', 'sdf', 'mol', 'mol2', 'pdb')}\n",
"for filename in uploaded:\n",
" extension = os.path.splitext(filename)[1].lower()[1:]\n",
" if extension not in uploaded_split:\n",
" raise ValueError(f'The extension {extension} is not coded for')\n",
" data = uploaded[filename] # str or bytes??\n",
" if isinstance(data, bytes):\n",
" data = data.decode('utf8')\n",
" uploaded_split[extension].append(dict(filename=filename, data=data))\n",
"\n",
"# parse SMILES for PDB\n",
"if uploaded_split['smi']:\n",
" # smilesdex? yes, I am going to hell for using not only Hungarian notation in Python, \n",
" # but a Pokémon flavoured one.\n",
"\n",
" smilesdex: Dict[str, str] = dict(map(lambda line: line.split('\\t')[-1::-1],\n",
" data.replace('\\n\\n', '\\n').strip().split('\\n'))\n",
" )\n",
"else:\n",
" smilesdex = {}\n",
"\n",
"\n",
"# parsers\n",
"def read_sdf(filename: str, data: str) -> List[Chem.Mol]:\n",
" mols = []\n",
" # I am pretty sure there's no way to run `Chem.SDMolSupplier` on a block\n",
" with open(os.path.join(input_folder, filename), 'w') as fh:\n",
" fh.write(data)\n",
" with Chem.SDMolSupplier(filename, removeHs=False) as suppl:\n",
" for mol in suppl:\n",
" # do I need to add a name??\n",
" mols.append(mol)\n",
" return mols\n",
"\n",
"\n",
"def read_mol(filename: str, data: str, fun: Callable) -> Chem.Mol:\n",
" mol = fun(data)\n",
" name = os.path.splitext(filename)[0]\n",
" mol.SetProp('_Name', name)\n",
" return mol\n",
"\n",
"\n",
"def get_smiles(filename: str, smilesdex: Dict[str, str]) -> Union[None, str]:\n",
" name = os.path.splitext(filename)[0]\n",
" if name in smilesdex:\n",
" return smilesdex[name]\n",
" else:\n",
" warn(f'Could not find matching SMILES to {name}')\n",
" return None\n",
"\n",
"from fragmenstein import Victor\n",
"# parse molecules...\n",
"hits = []\n",
"for entry in uploaded_split['sdf']:\n",
" hits.extend(read_sdf(entry['filename'], entry['data']))\n",
"for entry in uploaded_split['mol']:\n",
" hits.append(read_mol(entry['filename'], entry['data'], Chem.MolFromMolBlock))\n",
"for entry in uploaded_split['mol2']:\n",
" hits.append(read_mol(entry['filename'], entry['data'], Chem.MolFromMol2Block))\n",
"for entry in uploaded_split['pdb']:\n",
" smiles: Union[None, str] = get_smiles(entry['filename'], smilesdex)\n",
" hits.append(Victor.extract_mol(name=entry['filename'],\n",
" block=entry['data'],\n",
" smiles=smiles,\n",
" ligand_resn=ligand_residue_name,\n",
" proximityBonding=proximityBonding,\n",
" throw_on_error=True)\n",
" )\n",
"\n",
"# pollute the directory with files\n",
"# with Chem.SDWriter(os.path.join(input_folder, 'provided.sdf')) as w:\n",
"# for hit in hits:\n",
"# w.write(hit)\n",
"\n",
"display_mols(hits)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "NEZBUhcNCWq9"
},
"outputs": [],
"source": [
"#@title Alternative source of hits: Demo data\n",
"#@markdown If using MPro template, you can use the MPro Covid moonshot data too.\n",
"n_hits = 5 #@param {type:\"integer\"}\n",
"#@markdown Merging Covid moonshot leads is kind of weird thing to do (cf. Lipinski rule). Max Dalton:\n",
"size_cutoff = 200 #@param {type:\"integer\"}\n",
"\n",
"hits = mpro_data.get_n_filtered_mols(amount=n_hits, mw_cutoff=size_cutoff)\n",
"display_mols(hits)"
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"#@title Alternative source of hits: custom code?\n",
"#@markdown Uncollapse this cell for code\n",
"\n",
"# some filter for either option:\n",
"\n",
"def some_filter(filename:str) -> bool:\n",
" name, extension = os.path.splitext(filename)\n",
" return extension != '.mol' and filename.count('-') > 1\n",
"\n",
"# ======== Example... local\n",
"\n",
"import os\n",
"\n",
"hits: List[Chem.Mol] = []\n",
"for filename in os.listdir('somefolder/molfiles/'):\n",
" if not some_filter(filename):\n",
" continue\n",
" mol: Chem.Mol = Chem.MolFromMolFile( f'somefolder/molfiles/{filename}' )\n",
" mol.SetProp('_Name', filename)\n",
" hits.append(mol)\n",
"display_mols(hits)\n",
"\n",
"# ======== Example... retrieve from GitHub\n",
"!pip install PyGithub\n",
"\n",
"from github import Github\n",
"from github.ContentFile import ContentFile\n",
"import os\n",
"from typing import Tuple, List\n",
"from rdkit import Chem\n",
"# -------------------------------------------------------\n",
"def _content2mol(content:ContentFile) -> Chem.Mol:\n",
" if content.path.count('-') > 1:\n",
" return\n",
" # \"molfiles/diamond-x0158_B.mol\"\n",
" filename = os.path.split(content.path)[1]\n",
" if not some_filter(filename):\n",
" return None\n",
" mol: Chem.Mol = Chem.MolFromMolBlock( content.decoded_content.decode('utf8') )\n",
" mol.SetProp('_Name', filename)\n",
" return mol\n",
"\n",
"def get_repo(username:str, repo_name:str) -> Github:\n",
" return Github().get_user(username).get_repo(repo_name)\n",
"\n",
"def hits_from_github(username:str, repo_name:str, folder:str) -> List[Chem.Mol]:\n",
" repo = get_repo(username, repo_name)\n",
" return list(filter(lambda x: x is not None, map(_content2mol, repo.get_contents(folder))))\n",
"\n",
"# ----------------- Change these:\n",
"all_hits:List[Chem.Mol] = hits_from_github(username='matteoferla',\n",
" repo_name='NSP3-macrodomain',\n",
" folder='molfiles')\n",
"hits = all_hits[:20]\n",
"pdbblock:str = get_repo(username='matteoferla', repo_name='NSP3-macrodomain').get_contents(\"molfiles/template.pdb\").decoded_content.decode('utf8')"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "0w9YB7lRV7-a"
},
"source": [
"# Tweaks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "9T1_91mwCWq6"
},
"outputs": [],
"source": [
"#@title Optionally prepare it (1/3)\n",
"#@markdown This will be done by \n",
"\n",
"#@markdown 1. loading it in PyRosetta\n",
"#@markdown 2. Optionally energy minimising around a target\n",
"#@markdown 3. Optionally remove some molecules\n",
"\n",
"#@markdown ### Step 1\n",
"#@markdown If the model has novel ligands, they will be loaded.\n",
"#@markdown But to do this a residue type (=topology) needs to be made or loaded.\n",
"#@markdown These are saved as \"params files\".\n",
"#@markdown These following options control both the \"acceptor\" and \"donor\" poses (if uploaded).\n",
"#@markdown ### Params\n",
"#@markdown * Some compounds are parameterised in the database folder of rosetta,\n",
"#@markdown others in the PDB component database (if loaded).\n",
"#@markdown * Uses the params defined in the cell of the acceptor pose.\n",
"#@markdown * If there is no topology avalaible one will be made.\n",
"#@markdown * If a params file is present in the working folder it will use it.\n",
"#@markdown * See below or visit https://params.mutanalyst.com/ to generate them (upload the with the folder icon on the left).\n",
"\n",
"#@markdown This forces it (a bit silly):\n",
"force_parameterisation = False #@param {type:\"boolean\"}\n",
"#@markdown If it needs to be parameterised make it protonated for pH 7?\n",
"neutralize_params = True #@param {type:\"boolean\"}\n",
"save_params = True #@param {type:\"boolean\"}\n",
"\n",
"#@markdown If a params file is present in the working folder it will use it.\n",
"#@markdown Leave this blank... otherwise (comma separated w/ no rando spaces):\n",
"extra_params_files_to_use = '' #@param {type:\"string\"}\n",
"extra_params = [f for f in extra_params_files_to_use.split(',') if f]\n",
"use_all_folder_params = '' #@param {type:\"boolean\"}\n",
"if use_all_folder_params:\n",
" present_params = [filename for filename in os.listdir() if os.path.splitext(filename) == '.params']\n",
"else:\n",
" present_params = []\n",
"print('loading pose...')\n",
"template_pose = ph.ligands.load.parameterized_pose_from_pdbblock(pdbblock,\n",
" wanted_ligands=[],\n",
" force_parameterisation=force_parameterisation,\n",
" neutralise_params=neutralize_params,\n",
" save_params=save_params,\n",
" overriding_params=extra_params + present_params)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "3C6ULKwMCWq7"
},
"outputs": [],
"source": [
"#@title Optional energy minimisation around a target (2/3)\n",
"#@markdown (Requires previous cell run)\n",
"assert 'template_pose' in globals(), 'Step 1 was not run'\n",
"\n",
"#@markdown If use density map is true, you will be prompted to upload a density map.\n",
"#@markdown Upload a f0fc ccp4 or a mrc map. (not a ccp4 difference map, \n",
"#@markdown a mtz reciprocal space map or a pirate treasure map)\n",
"#@markdown The map needs to be in the same position as the template.\n",
"use_density_map = True #@param {type:\"boolean\"}\n",
"#@markdown The whole structure could be minimised, but that would be pointless costly timewise\n",
"#@markdown for this task.\n",
"#@markdown Specify what residue (amino acid or ligand) to centre around \n",
"center_residue_index = 1 #@param {type:\"integer\"}\n",
"center_residue_chain = 'A' #@param {type:\"string\"}\n",
"center_index: int = template_pose.pdb_info().pdb2pose(res=center_residue_index, chain=center_residue_chain)\n",
"assert center_index != 0, 'That residue does not exist!'\n",
"\n",
"#@markdown Specify which neighbouring residues to select in one of three ways:\n",
"\n",
"#@markdown (1) Cutoff distance for the neighbouring residues (in Ångströms) (centroid to centroid)?\n",
"#@markdown set to zero to not use.\n",
"neighborhood_radius = 1 #@param {type:\"integer\"}\n",
"#@markdown (2) Cutoff distance for the neighbouring residues (in Ångströms) (closest atom to closest atom)?\n",
"#@markdown set to zero to not use.\n",
"cc_neighborhood_radius = 0 #@param {type:\"integer\"}\n",
"#@markdown (3) Max number of neighbouring residues to choose?\n",
"#@markdown set to zero to not use.\n",
"n_neighbors = 0 #@param {type:\"integer\"}\n",
"\n",
"#@markdown ## Minimisation\n",
"#@markdown How many cycles of FastRelax to use? 3–15\n",
"cycles = 3 #@param {type:\"integer\"}\n",
"#@markdown to change scorefunctions and so forth edit the code.\n",
"\n",
"#@markdown Note. The class Igor has two classmethods that normally \n",
"#@markdown perform these template minimisation steps (`Igor.download_map` and `Igor.relax_with_ED`),\n",
"#@markdown but this a cruder and quicker minimisation (local).\n",
"\n",
"# Get map\n",
"if use_density_map:\n",
" map_filename = 'uploaded_map.ccp4'\n",
" uploaded = files.upload()\n",
" assert len(uploaded) == 1, 'wrong number of files (only one plz)'\n",
" filename = list(uploaded.keys())[0]\n",
" mapblock = list(uploaded.values())[0]\n",
" with open(os.path.join(input_folder, filename), 'wb') as fh:\n",
" fh.write(mapblock)\n",
" # this can be done with `Igor.relax_with_ED`, but I wanted the option here to do\n",
" # it with or without the map\n",
" ed = ph.prep_ED(template_pose, map_filename)\n",
" assert ed.matchPose(template_pose) > 0.5, 'This is a rubbish fit. Upload the right map.'\n",
"\n",
"# prep scorefunction\n",
"scorefxn = pyrosetta.get_fa_scorefxn()\n",
"if use_density_map:\n",
" scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.elec_dens_fast,\n",
" 30)\n",
"\n",
"selector = pyrosetta.rosetta.core.select.residue_selector\n",
"resi_sele = selector.ResidueIndexSelector(center_index)\n",
"if neighborhood_radius != 0:\n",
" neighbor_sele = selector.NeighborhoodResidueSelector(resi_sele,\n",
" distance=neighborhood_radius,\n",
" include_focus_in_subset=True)\n",
"elif cc_neighborhood_radius != 0:\n",
" neighbor_sele = selector.CloseContactResidueSelector()\n",
" neighbor_sele.central_residue_group_selector(resi_sele)\n",
" neighbor_sele.threshold(cc_neighborhood_radius)\n",
"elif n_neighbors != 0:\n",
" neighbor_sele = selector.NumNeighborsSelector(n_neighbors, 20)\n",
" # Ah. True. NumNeighborsSelector does not work in PyRosetta.\n",
" raise NotImplementedError\n",
"else:\n",
" raise ValueError\n",
"\n",
"# relax\n",
"movemap = pyrosetta.MoveMap()\n",
"movemap.set_bb(allow_bb=neighbor_sele.apply(template_pose))\n",
"movemap.set_chi(allow_chi=neighbor_sele.apply(template_pose))\n",
"relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, cycles)\n",
"relax.set_movemap(movemap)\n",
"relax.apply(template_pose)\n",
"\n",
"pdbblock = ph.get_pdbstr(template_pose)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "cxgGbqxPCWq7"
},
"outputs": [],
"source": [
"#@title Optional remove specified stuff (3/3)\n",
"#@markdown (Requires the cell two back to be run)\n",
"import re\n",
"from warnings import warn\n",
"\n",
"unwanted_residue_names_raw = 'HOH' #@param {type:\"string\"}\n",
"unwanted_residue_names_raw = re.sub(r'[\\W]', ' ', unwanted_residue_names_raw)\n",
"unwanted_residue_names = unwanted_residue_names_raw.split()\n",
"selector = pyrosetta.rosetta.core.select.residue_selector\n",
"for unwanted_resn in unwanted_residue_names:\n",
" sele = selector.ResidueNameSelector()\n",
" sele.set_residue_name3(unwanted_resn)\n",
" try:\n",
" sele.apply(template_pose)\n",
" except RuntimeError:\n",
" # ResidueNameSelector: XYZ is not a valid residue type name.\n",
" warn(f'There is no residue {unwanted_resn}!')\n",
" for idx in reversed(list(selector.ResidueVector(selector.apply(template_pose)))):\n",
" template_pose.delete_residue_slow(idx)\n",
" assert len(selector.ResidueVector(sele.apply(template_pose))) == 0\n",
"\n",
"pdbblock = ph.get_pdbstr(template_pose)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LuPnpI-_V7-b"
},
"source": [
"# Enter Victor Fragmenstein's laboratory!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "nCkqjcdQCWq-"
},
"outputs": [],
"source": [
"#@title Merge/link -> find similars -> place similars\n",
"#@markdown Three step process:\n",
"\n",
"#@markdown 1. the hits are combined pairwise\n",
"#@markdown 2. the mergers are queried in the SmallWorld server against the Enamine REAL DB\n",
"#@markdown 3. the purchasable similars are placed\n",
"\n",
"#@markdown In the documentation the example uses `sqlitedict.SqliteDict`\n",
"#@markdown as this avoids dramas from segfaults from `KeyboardInterrupt` or funky entries.\n",
"\n",
"# Okay, the code below contains some black magic.\n",
"# a Chem.Mol is sent down the pipe to the subprocess pickled.\n",
"# But this loses its properties (`mol.HasProp`).\n",
"# unless this dark ritual is performed:\n",
"# https://github.com/matteoferla/Fragmenstein/blob/master/documentation/mol_properties.md\n",
"\n",
"#@markdown ⚠ In this notebook, ligand efficiency against the filtered set is used for ranking.\n",
"#@markdown Ranking is a topic into itself. So only simple ranking options are presented here:\n",
"ranking = '∆∆G' #@param [\"LE\", \"∆∆G\", \"comRMSD\"]\n",
"#@markdown ∆∆G: this has the issue that a greater number of atoms will result in a lower score,\n",
"#@markdown even if each is not contributing much.\n",
"#@markdown LE: Ligand efficiency is the most correct way to rank, but will result in similarly sized compounds to the hits, which is not desirable in fragment building.\n",
"#@markdown comRMSD: by sorting by combined RMSD the most faithful hits will be placed first.\n",
"\n",
"#@markdown Angstrom distance\n",
"joining_cutoff = 5 #@param {type:\"integer\"}\n",
"#@markdown Angstrom distance\n",
"quick_reananimation = True #@param {type:\"boolean\"}\n",
"#@markdown Convalent residue (cysteine only out of the box).\n",
"#@markdown Set as '' if noncovalent. '145A' is for MPro demo data.\n",
"covalent_resi = '145A' #@param {type:\"string\"}\n",
"if covalent_resi in ('', 'none', 'None', 'False', 'false', '0'):\n",
" covalent_resi = None\n",
"\n",
"# ============================================================================================\n",
"# ## Define the process\n",
"\n",
"#@markdown The mergers may not be purchasable.\n",
"#@markdown As a result here the purchasable similars in Enamine Real can be sought\n",
"find_similars = True #@param {type:\"boolean\"}\n",
"topN_to_pick = 10 #@param {type:\"integer\"}\n",
"place_similars = True #@param {type:\"boolean\"}\n",
"#@markdown For the placement of similars use the original hits or the unminimised merger?\n",
"use_originals = False #@param {type:\"boolean\"}\n",
"\n",
"place_similars = find_similars and place_similars\n",
"\n",
"\n",
"import os, re\n",
"import pyrosetta, logging\n",
"import pandas as pd\n",
"from rdkit import Chem\n",
"from fragmenstein import Victor, Laboratory\n",
"\n",
"Victor.work_path = output_folder\n",
"Victor.monster_throw_on_discard = True # stop this merger if a fragment cannot be used.\n",
"Victor.monster_joining_cutoff = joining_cutoff # Å\n",
"Victor.quick_reanimation = quick_reananimation # for the impatient\n",
"Victor.error_to_catch = Exception # stop the whole laboratory otherwise\n",
"#Victor.enable_stdout(logging.ERROR)\n",
"Victor.enable_logfile(os.path.join(output_folder, 'demo.log'), logging.ERROR)\n",
"\n",
"# calculate !\n",
"lab = Laboratory(pdbblock=pdbblock, covalent_resi=covalent_resi)\n",
"n_cores = 1 #@param {type:\"integer\"}\n",
"combinations:pd.DataFrame = lab.combine(hits, n_cores=n_cores)\n",
"\n",
"\n",
"# =============================================================================================\n",
"# ## plot results\n",
"\n",
"\n",
"import plotly.express as px\n",
"from IPython.display import display\n",
"\n",
"fig = px.histogram(combinations,\n",
" x='outcome',\n",
" category_orders={'outcome': lab.category_labels},\n",
" title='Distribution of Combination outcome')\n",
"fig.show()\n",
"\n",
"# =============================================================================================\n",
"# ## Reverse the warhead...\n",
"# this is really unusual and janky way of doing it as one ought to know the metadata already...\n",
"\n",
"from rdkit.Chem import AllChem\n",
"from fragmenstein import Victor\n",
"from typing import *\n",
"from warnings import warn\n",
"\n",
"warhead_names = []\n",
"unreacted_smiles = []\n",
"for i, row in combinations.iterrows():\n",
" unrxn, wn = Victor.guess_warhead(row.smiles) #: Tuple[str, str]\n",
" warhead_names.append(wn)\n",
" unreacted_smiles.append(unrxn)\n",
"\n",
"combinations['unreacted_smiles'] = unreacted_smiles\n",
"combinations['warhead_type'] = warhead_names\n",
"\n",
"# save\n",
"\n",
"combinations.to_csv('combinations.csv')\n",
"combinations.to_pickle('combinations.p')\n",
"\n",
"# =============================================================================================\n",
"# ## top 10\n",
"best_combinations = combinations.loc[combinations.outcome == 'acceptable'].sort_values(ranking).reset_index(drop=True).head(topN_to_pick)\n",
"if len(best_combinations):\n",
" print(f'Top {topN_to_pick} mergers/linkers sorted by {ranking}')\n",
" #PandasTools.AddMoleculeColumnToFrame(best_combinations,'smiles','molecule',includeFingerprints=False)\n",
" display(best_combinations.drop(['unmin_binary', 'min_binary'], axis=1))\n",
"else:\n",
" display(combinations.error)\n",
" reasons = combinations.error.astype(str).str.split(r'^(\\w+)\\:', expand=True)[1].value_counts().to_dict()\n",
" raise RuntimeError(f'The combinations failed because {reasons}')\n",
"\n",
"# =============================================================================================\n",
"# ### Place purchasable similars\n",
"\n",
"from smallworld_api import SmallWorld\n",
"from warnings import warn\n",
"\n",
"sws = SmallWorld()\n",
"# this call requires an internet connection\n",
"chemical_databases:pd.DataFrame = sws.retrieve_databases()\n",
"\n",
"if find_similars:\n",
" similars = sws.search_many(best_combinations.unreacted_smiles,\n",
" dist=25,\n",
" db=sws.REAL_dataset,\n",
" tolerated_exceptions=Exception)\n",
"\n",
" similars['inspirations'] = similars.query_index.map( best_combinations.regarded.to_dict() )\n",
" similars['merger'] = similars.query_index.map( best_combinations.smiles.to_dict() )\n",
" similars['merger_∆∆G'] = similars.query_index.map( best_combinations['∆∆G'].to_dict() )\n",
" similars['inspiration_mols'] = similars.query_index.map( best_combinations.hit_mols.to_dict() )\n",
" similars['merger_unminimized_mol'] = similars.query_index.map( best_combinations.unminimized_mol.to_dict() )\n",
" similars['merger_minimized_mol'] = similars.query_index.map( best_combinations.minimized_mol.to_dict() )\n",
" similars.to_csv('similars.csv')\n",
" similars.to_pickle('similars.p')\n",
"\n",
" display(similars[['smiles', 'name', 'topodist', 'inspirations', 'merger', 'merger_∆∆G']])\n",
"\n",
"# ============ place the similars ==================\n",
"if place_similars:\n",
" if use_originals:\n",
" similars['hits'] = similars.inspiration_mols\n",
" else:\n",
" # make a list of one, the unminimised merger\n",
" similars['hits'] = similars.merger_unminimized_mol.apply(lambda m: [m])\n",
"\n",
" lab = Laboratory(pdbblock=pdbblock, covalent_resi=covalent_resi)\n",
" placements:pd.DataFrame = lab.place(similars, expand_isomers=False, n_cores=n_cores)\n",
" display(placements)\n",
" placements['const_ratio'] = placements['N_constrained_atoms'] / (\n",
" placements['N_constrained_atoms'] + placements['N_unconstrained_atoms'])\n",
"\n",
" from rdkit import Chem\n",
" from typing import *\n",
"\n",
" m = similars.drop_duplicates('name').set_index('name').to_dict()\n",
" placements['merger_∆∆G'] = placements['name'].map(m['merger_∆∆G'])\n",
" placements['merger_minimized_mol'] = placements['name'].map(m['merger_minimized_mol'])\n",
" placements['merger_unminimized_mol'] = placements['name'].map(m['merger_unminimized_mol'])\n",
" placements.rename(columns={'unminimized_mol': 'enamine_unminimized_mol',\n",
" 'minimized_mol': 'enamine_minimized_mol'}, inplace=True)\n",
" placements['merger_inspiration_names'] = placements['name'].map(m['inspirations'])\n",
" placements['merger_inspiration_mols'] = placements.hit_mols\n",
" nan_to_list = lambda value: value if isinstance(value, list) else []\n",
" placements['disregarded'] = placements.disregarded.apply(nan_to_list)\n",
" placements['regarded'] = placements.regarded.apply(nan_to_list)\n",
"\n",
" placements.to_csv('placements.csv')\n",
" placements.to_pickle('placements.p')\n",
"\n",
" # NB: more than 2 in 3 constrained is actually uncommon with enamine real for larger mergers.\n",
" # hence the 1 in 2\n",
"\n",
" best_placements = placements.loc[\n",
" (placements.outcome == 'acceptable')\n",
" & (placements.const_ratio > 1/2) ].sort_values(ranking).reset_index(drop=True).head(topN_to_pick)\n",
" if len(best_placements):\n",
" print(f'Top {topN_to_pick} placements sorted by {ranking}')\n",
" #PandasTools.AddMoleculeColumnToFrame(best_combinations,'smiles','molecule',includeFingerprints=False)\n",
" # noisy_fields = ['hit_mols', 'merger_unminimized_mol',\n",
" # 'merger_unminimized_mol',\n",
" # 'unmin_binary', 'min_binary']\n",
" noisy_fields = []\n",
" display( best_placements.drop(noisy_fields, axis=1) )\n",
" else:\n",
" display(placements.error)\n",
" reasons = placements.error.astype(str).str.split(r'^(\\w+)\\:', expand=True)[1].value_counts().to_dict()\n",
" raise RuntimeError(f'The placements failed because {reasons}')\n",
"\n",
"#PandasTools.AddMoleculeColumnToFrame(best_placements,'smiles','molecule',includeFingerprints=False)\n",
"\n",
"\n",
"# =============================================================================================\n",
"# ### Results redux\n",
"\n",
"from IPython.display import clear_output, HTML, display\n",
"from ipywidgets import interact, interactive, fixed, interact_manual\n",
"import ipywidgets as widgets\n",
"\n",
"headerify: Callable[[str], HTML] = lambda header: HTML(f'<h3>{header}</h3>')\n",
"\n",
"clear_output()\n",
"if place_similars:\n",
" fig = px.histogram(placements,\n",
" x='outcome',\n",
" category_orders={'outcome': lab.category_labels},\n",
" title='Distribution of Placement outcome')\n",
" fig.show()\n",
"\n",
"\n",
"display(headerify('Provided hits'))\n",
"display_mols(hits)\n",
"display(headerify('Step 1. Combine'))\n",
"fig.show()\n",
"print(f'Top {topN_to_pick} mergers/linkers sorted by {ranking}')\n",
"display(best_combinations.drop(['unmin_binary', 'min_binary'], axis=1))\n",
"display_mols(best_combinations.unminimized_mol)\n",
"\n",
"def show3D_combined(index_to_show):\n",
" row = best_placements.iloc[index_to_show]\n",
" print(f'Green: hits {row[\"merger_inspiration_names\"]}')\n",
" print('Cyan: merger')\n",
" display(make_3Dview(pdbblock, {'greenCarbon': row.hit_mols,\n",
" 'cyanCarbon': [row.minimized_mol]}))\n",
"\n",
"print('In the sorted combinations table (`best_combinations`) which index do you want to see:')\n",
"scale = widgets.IntSlider(min=0,max=len(best_combinations)-1, step=1, value=0)\n",
"interact(show3D_combined, index_to_show=scale)\n",
"#display(similars)\n",
"if place_similars:\n",
"\n",
" def show3D_placed(index_to_show):\n",
" row = best_placements.iloc[index_to_show]\n",
" print(f'Green: hits {row[\"merger_inspiration_names\"]}')\n",
" print('Cyan: merger')\n",
" print(f'Magenta: Enamine Real purchasable {row[\"name\"]}')\n",
" display(make_3Dview(pdbblock, {'greenCarbon': row.merger_inspiration_mols,\n",
" 'cyanCarbon': [row.merger_minimized_mol],\n",
" 'magentaCarbon': [row.enamine_minimized_mol]}))\n",
"\n",
" display(headerify('Step 2. Placement of purchasable similars'))\n",
" display(headerify(f'Top {topN_to_pick} Placements'))\n",
" display( best_placements.drop(noisy_fields, axis=1) )\n",
"\n",
" print('In the sorted combinations table (`best_combinations`) which index do you want to see:')\n",
" scale = widgets.IntSlider(min=0,max=len(best_placements)-1, step=1, value=0)\n",
" interact(show3D_placed, index_to_show=scale)\n",
"elif find_similars:\n",
" display(headerify('Purchasable similars'))\n",
" display(similars)"
]
}
],
"metadata": {
"colab": {
"name": "colab_fragmenstein.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 1
}