matteoferla/Fragmenstein

View on GitHub
colab-notebooks/colab_fragmenstein_Wictor.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_FoJBuFPCWqw"
   },
   "source": [
    "# Fragmenstein\n",
    "Fragmenstein is a position-based fragment-merging python3 tool.\n",
    "\n",
    "<img src=\"https://github.com/matteoferla/Fragmenstein/raw/master/images/overview.png\" width=\"800\" alt=\"logo\">\n",
    "\n",
    "In its merging/linking operation, under the coordination of the class Wictor,\n",
    "the class Monster finds spatially overlapping atoms and stitches them together (with RDKit),\n",
    "then the class Igor reanimates (minimises in PyRosetta) them within the protein site restraining the atoms to original positions.\n",
    "As this compound may not be purchasable, one can use the placement operation to\n",
    "make a stitched-together molecule based a template."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-z25QZ4mV7-M"
   },
   "source": [
    "# Further details"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_ou9a06GV7-N"
   },
   "source": [
    "## Details\n",
    "\n",
    "This notebook does sevaral operations as examples.\n",
    "\n",
    "*\n",
    "* optionally extracts the hits from provided PDB structures.\n",
    "* It combines combinatorially the provided hits\n",
    "* It then searches for the most similar molecules to the user chosen molecule in the Enamine Real database (via the API of John Irwin's SmallWorld server)\n",
    "* places them.\n",
    "\n",
    "There is no template minimisation, and it cannot deal with covalent compounds,\n",
    "for that use the PyRosetta version (regular Victor). This uses Wictor, (_w_ithout PyRosetta version).\n",
    "The demo data (MPro from the [Covid Moonshot](https://covid.postera.ai/covid) available from [Fragalysis](https://fragalysis.diamond.ac.uk/)) will use covalent residues.\n",
    "\n",
    "This notebook from https://github.com/matteoferla/Fragmenstein.\n",
    "\n",
    "It is meant to be opened in Colabs via [https://colab.research.google.com/github/matteoferla/Fragmenstein/blob/master/colab_fragmenstein.ipynb](https://colab.research.google.com/github/matteoferla/Fragmenstein/blob/master/colab_fragmenstein_Wictor.ipynb)\n",
    "\n",
    "See also:\n",
    "\n",
    "* https://github.com/matteoferla/Fragmenstein\n",
    "* https://fragmenstein.readthedocs.io\n",
    "* https://github.com/matteoferla/Python_SmallWorld_API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "idggAKzIV7-O"
   },
   "source": [
    "# Installation and initialisation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "ieOQnFAkCWq1"
   },
   "outputs": [],
   "source": [
    "#@title Installation\n",
    "import sys\n",
    "#@markdown Press the play button on the top right hand side of this cell\n",
    "#@markdown once you have checked the settings.\n",
    "#@markdown You will be notified that this notebook is not from Google, that is normal.\n",
    "\n",
    "#@markdown Send error messages to errors.matteoferla.com for logging?\n",
    "#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)\n",
    "report_errors = False #@param {type:\"boolean\"}\n",
    "if report_errors:\n",
    "    !pip install notebook-error-reporter\n",
    "    from notebook_error_reporter import ErrorServer  # noqa it is an option install\n",
    "\n",
    "    es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')\n",
    "    es.enable()\n",
    "\n",
    "!pip install  -U -q typing_extensions>=4.9\n",
    "!pip install  -U -q typeguard>=4.1.5\n",
    "# weird glitch of the week:\n",
    "try:\n",
    "    from typing_extensions import Buffer # needed by some external dependency somewhere\n",
    "except ImportError:\n",
    "    import typing_extensions\n",
    "    typing_extensions.Buffer = None\n",
    "\n",
    "!pip install -q rdkit rdkit-to-params biopython pyrosetta-help jsme_notebook\n",
    "!pip install -q --upgrade plotly\n",
    "\n",
    "# Installation for PLIP\n",
    "# this wrongly assumes Mamba and Colab will stay at 3.10 for ever...\n",
    "!wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh\n",
    "# Mamba at 3.10\n",
    "#!wget -qnc https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Mambaforge-23.3.1-1-Linux-x86_64.sh\n",
    "!bash Mambaforge-Linux-x86_64.sh -bfp /usr/local\n",
    "python_version = f'{sys.version_info.major}.{sys.version_info.minor}'\n",
    "sys.path.append(f'/usr/local/lib/python{python_version}/site-packages')\n",
    "!mamba config --set auto_update_conda false\n",
    "# PLIP gets confused. Weird install\n",
    "!mamba install -y -c conda-forge openbabel lxml\n",
    "!mamba install -y -c conda-forge --no-deps plip\n",
    "\n",
    "import os\n",
    "from importlib import reload\n",
    "import pyrosetta_help as ph\n",
    "from typing import *\n",
    "\n",
    "\n",
    "# Colab no longer runs on 3.7\n",
    "# keeping this just in case it's needed again\n",
    "# hack to enable the backport:\n",
    "import sys\n",
    "if sys.version_info.major != 3 or sys.version_info.minor < 8:\n",
    "    !pip install -q singledispatchmethod\n",
    "    import functools\n",
    "    from singledispatchmethod import singledispatchmethod  # noqa it's okay, PyCharm, I am not a technoluddite\n",
    "    functools.singledispatchmethod = singledispatchmethod\n",
    "    !pip install -q typing_extensions\n",
    "    import typing_extensions\n",
    "    import typing\n",
    "    for key, fun in typing_extensions.__dict__.items():\n",
    "      if key not in typing.__dict__:\n",
    "        setattr(typing, key, fun)\n",
    "\n",
    "# ============================================================================\n",
    "#@markdown ### Use Google Drive\n",
    "#@markdown Optionally store your results in your google drive.\n",
    "#@markdown If `use_drive` is True, you will be prompted to give permission to use Google Drive\n",
    "#@markdown (you may be prompted to follow a link and possibly authenticate and then copy a code into a box)\n",
    "#@markdown —**_always_ remember to check strangers' code against data theft**:\n",
    "#@markdown e.g. search and look for all instances of `http`, `requests` and `post` in the code, and\n",
    "#@markdown make sure the creator is not typosquatting as someone else (e.g. username `Coogle`).\n",
    "use_drive = False  #@param {type:\"boolean\"}\n",
    "if use_drive:\n",
    "    ph.mount_google_drive(use_drive)\n",
    "\n",
    "!pip install py3Dmol -q\n",
    "from google.colab import output  # noqa (It's a colaboratory specific repo)\n",
    "output.enable_custom_widget_manager()\n",
    "\n",
    "\n",
    "\n",
    "# refresh imports\n",
    "import site\n",
    "site.main()\n",
    "\n",
    "# ================================================================\n",
    "# ================================================================\n",
    "# Fragmenstein\n",
    "# custom for this notebook\n",
    "!pip install smallworld_api fragmenstein -q\n",
    "\n",
    "# make folders in _path\n",
    "working_folder = 'fragmenstein_data'\n",
    "input_folder = 'input'\n",
    "output_folder = 'output'\n",
    "if not os.path.exists(working_folder):\n",
    "    os.mkdir(working_folder)\n",
    "os.chdir('fragmenstein_data')\n",
    "for folder in (input_folder, output_folder):\n",
    "    if not os.path.exists(folder):\n",
    "        os.mkdir(folder)\n",
    "\n",
    "# ================================================================\n",
    "# 3D viewer\n",
    "\n",
    "from typing import *\n",
    "from rdkit import Chem\n",
    "import py3Dmol\n",
    "\n",
    "\n",
    "def stylize(representation: str, color: str) -> Dict[str, Dict[str, Dict[str, str]]]:\n",
    "    if 'carbon' in color.lower():\n",
    "        return dict(style={representation: {'colorscheme': color}})\n",
    "    else:\n",
    "        return dict(style={representation: {'color': color}})\n",
    "\n",
    "\n",
    "def make_3Dview(template_pdbblock, colormols: Dict[str, List[Chem.Mol]]) -> py3Dmol.view:\n",
    "    \"\"\"\n",
    "    colormols is a diction of color/colorscheme to list of mols.\n",
    "    \"\"\"\n",
    "    view = py3Dmol.view(js=\"https://3dmol.org/build/3Dmol.js\")\n",
    "    view.addModel(template_pdbblock, \"pdb\", stylize('cartoon', 'gainsboro'))\n",
    "    view.setStyle(dict(hetflag=True), stylize('stick', 'whiteCarbon'))\n",
    "    for color, mols in colormols.items():\n",
    "        for mol in mols:\n",
    "            view.addModel(Chem.MolToMolBlock(mol), \"mol\", stylize('stick', color))\n",
    "    view.zoomTo(dict(hetflag=True))\n",
    "    return view\n",
    "\n",
    "\n",
    "# ================================================================\n",
    "# mol grid\n",
    "\n",
    "from rdkit.Chem import Draw\n",
    "from rdkit.Chem import AllChem\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "def display_mols(mols: Sequence[Chem.Mol],\n",
    "                 molsPerRow=5,\n",
    "                 subImgSize=(150,150),\n",
    "                 useSVG=True):\n",
    "    \"\"\"\n",
    "    Rudimentary wrapper for calling ``display(Draw.MolsToGridImage``\n",
    "    \"\"\"\n",
    "    flattos = [AllChem.RemoveHs(mol) for mol in mols if isinstance(mol, Chem.Mol)]\n",
    "    for mol in flattos:\n",
    "        AllChem.Compute2DCoords(mol)\n",
    "    display(Draw.MolsToGridImage(flattos,\n",
    "                         legends=[mol.GetProp('_Name') if mol.HasProp('_Name') else '-' for mol in mols],\n",
    "                         subImgSize=subImgSize, useSVG=useSVG,\n",
    "                         molsPerRow=molsPerRow))\n",
    "\n",
    "\n",
    "# ================================================================"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LQJXJETfV7-Y"
   },
   "source": [
    "# Uploads"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "iGekHmHTCWq5"
   },
   "outputs": [],
   "source": [
    "#@title Upload Template model\n",
    "#@markdown Upload PDB file of the template model used for placing the merger.\n",
    "#@markdown Suggestion: use a model bound to the native ligand or similar\n",
    "#@markdown as the side chains will be poised for action!\n",
    "#@markdown NB. this PDB and those of the hits need to be superposed.\n",
    "\n",
    "#@markdown Alternatively tick here for demo values (SARS-CoV-2 MPro):\n",
    "\n",
    "use_mpro = False #@param {type:\"boolean\"}\n",
    "\n",
    "if use_mpro:\n",
    "    from fragmenstein import mpro_data\n",
    "\n",
    "    pdbblock = mpro_data.get_template()\n",
    "else:\n",
    "    from google.colab import files  # noqa\n",
    "\n",
    "    uploaded = files.upload()\n",
    "    assert len(uploaded) == 1, 'Upload only one template. Hits later.'\n",
    "    pdbblock = list(uploaded.values())[0]\n",
    "    if isinstance(pdbblock, bytes):  # dont recall what format are txt sent as...\n",
    "        pdbblock = pdbblock.decode('utf8')  # noqa its bytes if it tested so"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "sNEf_L1ICWq8"
   },
   "outputs": [],
   "source": [
    "#@title Upload hits or bound hits\n",
    "#@markdown As the decision is based on extension, please no random extensions...\n",
    "#@markdown `mol.try2.mol` is fine, but `mol.molecule` or `mol.mol.txt` are not.\n",
    "\n",
    "#@markdown ### Option 1.\n",
    "#@markdown If you have a single multientry sdf file, upload that.\n",
    "\n",
    "#@markdown ### Option 2.\n",
    "#@markdown If you have multiple mol files, upload those.\n",
    "#@markdown RDKit handles mdl-mol files better than mol2 files.\n",
    "\n",
    "#@markdown ### Option 3.\n",
    "#@markdown If you require your ligand to be extracted from pdb files\n",
    "#@markdown upload those and specify the three-letter code of the ligand\n",
    "#@markdown (this is the 3-letter name within the provided PDB(s) not the intended name for the merger!):\n",
    "ligand_residue_name = 'LIG'  #@param {type:\"string\"}\n",
    "#@markdown This latter option is über-recommended if you have a covalently bound ligand\n",
    "#@markdown but your extracted mol files are not or lack bond order.\n",
    "#@markdown Note that you'll have to also upload a SMILES file (`.smi`)\n",
    "#@markdown —this is just a tab separated table of SMILES tab name.\n",
    "\n",
    "#@markdown If your PDB lacks `CONECT` entries (you monster), tick this:\n",
    "proximityBonding = False  #@param {type:\"boolean\"}\n",
    "\n",
    "#@markdown **Alternative** if to use demo data from covid moonshoot check use next cell.\n",
    "\n",
    "# Go!\n",
    "import os\n",
    "from google.colab import files  # noqa\n",
    "from rdkit import Chem\n",
    "from typing import *\n",
    "from warnings import warn\n",
    "\n",
    "# ======================================\n",
    "# get\n",
    "uploaded = files.upload()\n",
    "# sort\n",
    "uploaded_split = {k: [] for k in ('smi', 'sdf', 'mol', 'mol2', 'pdb')}\n",
    "for filename in uploaded:\n",
    "    extension = os.path.splitext(filename)[1].lower()[1:]\n",
    "    if extension not in uploaded_split:\n",
    "        raise ValueError(f'The extension {extension} is not coded for')\n",
    "    data = uploaded[filename]  # str or bytes??\n",
    "    if isinstance(data, bytes):\n",
    "        data = data.decode('utf8')\n",
    "    uploaded_split[extension].append(dict(filename=filename, data=data))\n",
    "\n",
    "# parse SMILES for PDB\n",
    "if uploaded_split['smi']:\n",
    "    # smilesdex? yes, I am going to hell for using not only Hungarian notation in Python, \n",
    "    # but a Pokémon flavoured one.\n",
    "\n",
    "    smilesdex: Dict[str, str] = dict(map(lambda line: line.split('\\t')[-1::-1],\n",
    "                                         data.replace('\\n\\n', '\\n').strip().split('\\n'))\n",
    "                                     )\n",
    "else:\n",
    "    smilesdex = {}\n",
    "\n",
    "\n",
    "# parsers\n",
    "def read_sdf(filename: str, data: str) -> List[Chem.Mol]:\n",
    "    mols = []\n",
    "    # I am pretty sure there's no way to run `Chem.SDMolSupplier` on a block\n",
    "    with open(os.path.join(input_folder, filename), 'w') as fh:\n",
    "        fh.write(data)\n",
    "    with Chem.SDMolSupplier(filename, removeHs=False) as suppl:\n",
    "        for mol in suppl:\n",
    "            # do I need to add a name??\n",
    "            mols.append(mol)\n",
    "    return mols\n",
    "\n",
    "\n",
    "def read_mol(filename: str, data: str, fun: Callable) -> Chem.Mol:\n",
    "    mol = fun(data)\n",
    "    name = os.path.splitext(filename)[0]\n",
    "    mol.SetProp('_Name', name)\n",
    "    return mol\n",
    "\n",
    "\n",
    "def get_smiles(filename: str, smilesdex: Dict[str, str]) -> Union[None, str]:\n",
    "    name = os.path.splitext(filename)[0]\n",
    "    if name in smilesdex:\n",
    "        return smilesdex[name]\n",
    "    else:\n",
    "        warn(f'Could not find matching SMILES to {name}')\n",
    "        return None\n",
    "\n",
    "from fragmenstein import Wictor\n",
    "# parse molecules...\n",
    "hits = []\n",
    "for entry in uploaded_split['sdf']:\n",
    "    hits.extend(read_sdf(entry['filename'], entry['data']))\n",
    "for entry in uploaded_split['mol']:\n",
    "    hits.append(read_mol(entry['filename'], entry['data'], Chem.MolFromMolBlock))\n",
    "for entry in uploaded_split['mol2']:\n",
    "    hits.append(read_mol(entry['filename'], entry['data'], Chem.MolFromMol2Block))\n",
    "for entry in uploaded_split['pdb']:\n",
    "    smiles: Union[None, str] = get_smiles(entry['filename'], smilesdex)\n",
    "    hits.append(Wictor.extract_mol(name=entry['filename'],\n",
    "                                   block=entry['data'],\n",
    "                                   smiles=smiles,\n",
    "                                   ligand_resn=ligand_residue_name,\n",
    "                                   proximityBonding=proximityBonding,\n",
    "                                   throw_on_error=True)\n",
    "                )\n",
    "\n",
    "# pollute the directory with files\n",
    "# with Chem.SDWriter(os.path.join(input_folder, 'provided.sdf')) as w:\n",
    "#     for hit in hits:\n",
    "#         w.write(hit)\n",
    "\n",
    "display_mols(hits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "NEZBUhcNCWq9"
   },
   "outputs": [],
   "source": [
    "#@title Alternative source of hits: Demo data\n",
    "#@markdown If using MPro template, you can use the MPro Covid moonshot data too.\n",
    "n_hits = 5  #@param {type:\"integer\"}\n",
    "#@markdown Merging Covid moonshot leads is kind of weird thing to do (cf. Lipinski rule). Max Dalton:\n",
    "size_cutoff = 200  #@param {type:\"integer\"}\n",
    "\n",
    "from fragmenstein import mpro_data\n",
    "hits = mpro_data.get_n_filtered_mols(amount=n_hits, mw_cutoff=size_cutoff)\n",
    "display_mols(hits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "#@title Alternative source of hits: custom code?\n",
    "#@markdown Uncollapse this cell for code\n",
    "\n",
    "# some filter for either option:\n",
    "\n",
    "def some_filter(filename:str) -> bool:\n",
    "    name, extension = os.path.splitext(filename)\n",
    "    return extension != '.mol' and filename.count('-') > 1\n",
    "\n",
    "# ======== Example... local\n",
    "\n",
    "import os\n",
    "\n",
    "hits: List[Chem.Mol] = []\n",
    "for filename in os.listdir('somefolder/molfiles/'):\n",
    "    if not some_filter(filename):\n",
    "        continue\n",
    "    mol: Chem.Mol = Chem.MolFromMolFile( f'somefolder/molfiles/{filename}' )\n",
    "    mol.SetProp('_Name', filename)\n",
    "    hits.append(mol)\n",
    "display_mols(hits)\n",
    "\n",
    "# ======== Example... retrieve from GitHub\n",
    "!pip install PyGithub\n",
    "\n",
    "from github import Github\n",
    "from github.ContentFile import ContentFile\n",
    "import os\n",
    "from typing import Tuple, List\n",
    "from rdkit import Chem\n",
    "# -------------------------------------------------------\n",
    "def _content2mol(content:ContentFile) -> Chem.Mol:\n",
    "    if content.path.count('-') > 1:\n",
    "        return\n",
    "    # \"molfiles/diamond-x0158_B.mol\"\n",
    "    filename = os.path.split(content.path)[1]\n",
    "    if not some_filter(filename):\n",
    "        return None\n",
    "    mol: Chem.Mol = Chem.MolFromMolBlock( content.decoded_content.decode('utf8') )\n",
    "    mol.SetProp('_Name', filename)\n",
    "    return mol\n",
    "\n",
    "def get_repo(username:str, repo_name:str) -> Github:\n",
    "    return Github().get_user(username).get_repo(repo_name)\n",
    "\n",
    "def hits_from_github(username:str, repo_name:str, folder:str) -> List[Chem.Mol]:\n",
    "    repo = get_repo(username, repo_name)\n",
    "    return list(filter(lambda x: x is not None, map(_content2mol, repo.get_contents(folder))))\n",
    "\n",
    "# ----------------- Change these:\n",
    "all_hits:List[Chem.Mol] = hits_from_github(username='matteoferla',\n",
    "                        repo_name='NSP3-macrodomain',\n",
    "                        folder='molfiles')\n",
    "hits = all_hits[:20]\n",
    "pdbblock:str = get_repo(username='matteoferla', repo_name='NSP3-macrodomain').get_contents(\"molfiles/template.pdb\").decoded_content.decode('utf8')"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0w9YB7lRV7-a"
   },
   "source": [
    "# Tweaks\n",
    "Template polish is in the PyRosetta version only, sorry."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LuPnpI-_V7-b"
   },
   "source": [
    "# Enter Victor Fragmenstein's laboratory!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "nCkqjcdQCWq-"
   },
   "outputs": [],
   "source": [
    "#@title Merge/link -> find similars -> place similars\n",
    "#@markdown Three step process:\n",
    "\n",
    "#@markdown 1. the hits are combined pairwise\n",
    "#@markdown 2. the mergers are queried in the SmallWorld server against the Enamine REAL DB\n",
    "#@markdown 3. the purchasable similars are placed\n",
    "\n",
    "#@markdown In the documentation the example uses `sqlitedict.SqliteDict`\n",
    "#@markdown as this avoids dramas from segfaults from `KeyboardInterrupt` or funky entries.\n",
    "\n",
    "# Okay, the code below contains some black magic.\n",
    "# a Chem.Mol is sent down the pipe to the subprocess pickled.\n",
    "# But this loses its properties (`mol.HasProp`).\n",
    "# unless this dark ritual is performed:\n",
    "# https://github.com/matteoferla/Fragmenstein/blob/master/documentation/mol_properties.md\n",
    "\n",
    "#@markdown &#9888; In this notebook, ligand efficiency against the filtered set is used for ranking.\n",
    "#@markdown Ranking is a topic into itself. So only simple ranking options are presented here:\n",
    "ranking = '∆∆G' #@param [\"LE\", \"∆∆G\", \"comRMSD\"]\n",
    "#@markdown ∆∆G: this has the issue that a greater number of atoms will result in a lower score,\n",
    "#@markdown even if each is not contributing much.\n",
    "#@markdown LE: Ligand efficiency is the most correct way to rank, but will result in similarly sized compounds to the hits, which is not desirable in fragment building.\n",
    "#@markdown comRMSD: by sorting by combined RMSD the most faithful hits will be placed first.\n",
    "\n",
    "#@markdown Angstrom distance\n",
    "joining_cutoff = 5  #@param {type:\"integer\"}\n",
    "#@markdown Angstrom distance\n",
    "#@markdown Convalent residue (cysteine only out of the box).\n",
    "#@markdown Set as '' if noncovalent. '145A' is for MPro demo data.\n",
    "covalent_resi = '145A' #@param {type:\"string\"}\n",
    "if covalent_resi in ('', 'none', 'None', 'False', 'false', '0'):\n",
    "    covalent_resi = None\n",
    "\n",
    "# ============================================================================================\n",
    "# ## Define the process\n",
    "\n",
    "#@markdown The mergers may not be purchasable.\n",
    "#@markdown As a result here the purchasable similars in Enamine Real can be sought\n",
    "find_similars = True #@param {type:\"boolean\"}\n",
    "topN_to_pick = 10  #@param {type:\"integer\"}\n",
    "place_similars = True #@param {type:\"boolean\"}\n",
    "#@markdown For the placement of similars use the original hits or the unminimised merger?\n",
    "use_originals = False  #@param {type:\"boolean\"}\n",
    "\n",
    "place_similars = find_similars and place_similars\n",
    "\n",
    "\n",
    "import os, re, logging\n",
    "import pandas as pd\n",
    "from rdkit import Chem\n",
    "from fragmenstein import Wictor, Laboratory\n",
    "\n",
    "Wictor.work_path = output_folder\n",
    "Wictor.monster_throw_on_discard = True  # stop this merger if a fragment cannot be used.\n",
    "Wictor.monster_joining_cutoff = joining_cutoff  # Å\n",
    "Wictor.error_to_catch = Exception  # stop the whole laboratory otherwise\n",
    "Wictor.enable_stdout(logging.ERROR)\n",
    "Wictor.enable_logfile(os.path.join(output_folder, 'demo.log'), logging.ERROR)\n",
    "Laboratory.Victor = Wictor\n",
    "# calculate !\n",
    "lab = Laboratory(pdbblock=pdbblock, covalent_resi=covalent_resi)\n",
    "n_cores = 1  #@param {type:\"integer\"}\n",
    "combinations:pd.DataFrame = lab.combine(hits, n_cores=n_cores)\n",
    "\n",
    "\n",
    "# =============================================================================================\n",
    "# ## plot results\n",
    "\n",
    "\n",
    "import plotly.express as px\n",
    "from IPython.display import display\n",
    "\n",
    "fig = px.histogram(combinations,\n",
    "                   x='outcome',\n",
    "                   category_orders={'outcome': lab.category_labels},\n",
    "                   title='Distribution of Combination outcome')\n",
    "fig.show()\n",
    "\n",
    "# =============================================================================================\n",
    "# ## Reverse the warhead...\n",
    "# this is really unusual and janky way of doing it as one ought to know the metadata already...\n",
    "\n",
    "from rdkit.Chem import AllChem\n",
    "from fragmenstein import Wictor\n",
    "from typing import *\n",
    "from warnings import warn\n",
    "\n",
    "warhead_names = []\n",
    "unreacted_smiles = []\n",
    "for i, row in combinations.iterrows():\n",
    "    unrxn, wn = Wictor.guess_warhead(row.smiles) #: Tuple[str, str]\n",
    "    warhead_names.append(wn)\n",
    "    unreacted_smiles.append(unrxn)\n",
    "\n",
    "combinations['unreacted_smiles'] = unreacted_smiles\n",
    "combinations['warhead_type'] = warhead_names\n",
    "\n",
    "# save\n",
    "\n",
    "combinations.to_csv('combinations.csv')\n",
    "combinations.to_pickle('combinations.p')\n",
    "\n",
    "# =============================================================================================\n",
    "# ## top 10\n",
    "best_combinations = combinations.loc[combinations.outcome == 'acceptable'].sort_values(ranking).reset_index(drop=True).head(topN_to_pick)\n",
    "if len(best_combinations):\n",
    "    print(f'Top {topN_to_pick} mergers/linkers sorted by {ranking}')\n",
    "    #PandasTools.AddMoleculeColumnToFrame(best_combinations,'smiles','molecule',includeFingerprints=False)\n",
    "    display(best_combinations.drop(['unmin_binary', 'min_binary'], axis=1))\n",
    "else:\n",
    "    display(combinations.error)\n",
    "    reasons = combinations.error.astype(str).str.split(r'^(\\w+)\\:', expand=True)[1].value_counts().to_dict()\n",
    "    raise RuntimeError(f'The combinations failed because {reasons}')\n",
    "\n",
    "# =============================================================================================\n",
    "# ### Place purchasable similars\n",
    "\n",
    "from smallworld_api import SmallWorld\n",
    "from warnings import warn\n",
    "\n",
    "sws = SmallWorld()\n",
    "# this call requires an internet connection\n",
    "chemical_databases:pd.DataFrame = sws.retrieve_databases()\n",
    "\n",
    "if find_similars:\n",
    "    similars: pd.DataFrame = sws.search_many(best_combinations.unreacted_smiles,\n",
    "                               dist=25,\n",
    "                               db=sws.REAL_dataset,\n",
    "                               tolerated_exceptions=Exception)\n",
    "\n",
    "    similars['inspirations'] = similars.query_index.map( best_combinations.regarded.to_dict() )\n",
    "    similars['merger'] = similars.query_index.map( best_combinations.smiles.to_dict() )\n",
    "    similars['merger_∆∆G'] = similars.query_index.map( best_combinations['∆∆G'].to_dict() )\n",
    "    similars['inspiration_mols'] = similars.query_index.map( best_combinations.hit_mols.to_dict() )\n",
    "    similars['merger_unminimized_mol'] = similars.query_index.map( best_combinations.unminimized_mol.to_dict() )\n",
    "    similars['merger_minimized_mol'] = similars.query_index.map( best_combinations.minimized_mol.to_dict() )\n",
    "    similars.to_csv('similars.csv')\n",
    "    similars.to_pickle('similars.p')\n",
    "\n",
    "    display(similars[['smiles', 'name', 'topodist', 'inspirations', 'merger', 'merger_∆∆G']])  # noqa\n",
    "\n",
    "# ============ place the similars ==================\n",
    "if find_similars and place_similars:\n",
    "    if use_originals:\n",
    "        similars['hits'] = similars.inspiration_mols\n",
    "    else:\n",
    "        # make a list of one, the unminimised merger\n",
    "        similars['hits'] = similars.merger_unminimized_mol.apply(lambda m: [m])\n",
    "\n",
    "    lab = Laboratory(pdbblock=pdbblock, covalent_resi=covalent_resi)\n",
    "    placements:pd.DataFrame = lab.place(similars, expand_isomers=False, n_cores=n_cores)\n",
    "    display(placements)\n",
    "    placements['const_ratio'] = placements['N_constrained_atoms'] / (\n",
    "                placements['N_constrained_atoms'] + placements['N_unconstrained_atoms'])\n",
    "\n",
    "    from rdkit import Chem\n",
    "    from typing import *\n",
    "\n",
    "    m = similars.drop_duplicates('name').set_index('name').to_dict()\n",
    "    placements['merger_∆∆G'] = placements['name'].map(m['merger_∆∆G'])\n",
    "    placements['merger_minimized_mol'] = placements['name'].map(m['merger_minimized_mol'])\n",
    "    placements['merger_unminimized_mol'] = placements['name'].map(m['merger_unminimized_mol'])\n",
    "    placements.rename(columns={'unminimized_mol': 'enamine_unminimized_mol',\n",
    "                               'minimized_mol': 'enamine_minimized_mol'}, inplace=True)\n",
    "    placements['merger_inspiration_names'] = placements['name'].map(m['inspirations'])\n",
    "    placements['merger_inspiration_mols'] = placements.hit_mols\n",
    "    nan_to_list = lambda value: value if isinstance(value, list) else []\n",
    "    placements['disregarded'] = placements.disregarded.apply(nan_to_list)\n",
    "    placements['regarded'] = placements.regarded.apply(nan_to_list)\n",
    "\n",
    "    placements.to_csv('placements.csv')\n",
    "    placements.to_pickle('placements.p')\n",
    "\n",
    "    # NB: more than 2 in 3 constrained is actually uncommon with enamine real for larger mergers.\n",
    "    # hence the 1 in 2\n",
    "\n",
    "    best_placements = placements.loc[\n",
    "        (placements.outcome == 'acceptable')\n",
    "        & (placements.const_ratio > 1/2) ].sort_values(ranking).reset_index(drop=True).head(topN_to_pick)\n",
    "    if len(best_placements):\n",
    "        print(f'Top {topN_to_pick} placements sorted by {ranking}')\n",
    "        #PandasTools.AddMoleculeColumnToFrame(best_combinations,'smiles','molecule',includeFingerprints=False)\n",
    "        # noisy_fields = ['hit_mols', 'merger_unminimized_mol',\n",
    "        #                           'merger_unminimized_mol',\n",
    "        #                           'unmin_binary', 'min_binary']\n",
    "        noisy_fields = []\n",
    "        display( best_placements.drop(noisy_fields, axis=1) )\n",
    "    else:\n",
    "        display(placements.error)\n",
    "        reasons = placements.error.astype(str).str.split(r'^(\\w+)\\:', expand=True)[1].value_counts().to_dict()\n",
    "        raise RuntimeError(f'The placements failed because {reasons}')\n",
    "\n",
    "#PandasTools.AddMoleculeColumnToFrame(best_placements,'smiles','molecule',includeFingerprints=False)\n",
    "\n",
    "\n",
    "# =============================================================================================\n",
    "# ### Results redux\n",
    "\n",
    "from IPython.display import clear_output, HTML, display\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "import ipywidgets as widgets\n",
    "\n",
    "headerify: Callable[[str], HTML] = lambda header: HTML(f'<h3>{header}</h3>')\n",
    "\n",
    "clear_output()\n",
    "if place_similars:\n",
    "    fig = px.histogram(placements,\n",
    "                       x='outcome',\n",
    "                       category_orders={'outcome': lab.category_labels},\n",
    "                       title='Distribution of Placement outcome')\n",
    "    fig.show()\n",
    "\n",
    "\n",
    "display(headerify('Provided hits'))\n",
    "display_mols(hits)\n",
    "display(headerify('Step 1. Combine'))\n",
    "fig.show()\n",
    "print(f'Top {topN_to_pick} mergers/linkers sorted by {ranking}')\n",
    "display(best_combinations.drop(['unmin_binary', 'min_binary'], axis=1))\n",
    "display_mols(best_combinations.unminimized_mol)\n",
    "\n",
    "def show3D_combined(index_to_show):\n",
    "    row = best_placements.iloc[index_to_show]\n",
    "    print(f'Green: hits {row[\"merger_inspiration_names\"]}')\n",
    "    print('Cyan: merger')\n",
    "    display(make_3Dview(pdbblock, {'greenCarbon': row.hit_mols,\n",
    "                           'cyanCarbon': [row.minimized_mol]}))\n",
    "\n",
    "print('In the sorted combinations table (`best_combinations`) which index do you want to see:')\n",
    "scale = widgets.IntSlider(min=0,max=len(best_combinations)-1, step=1, value=0)\n",
    "interact(show3D_combined, index_to_show=scale)\n",
    "#display(similars)\n",
    "if place_similars:\n",
    "\n",
    "    def show3D_placed(index_to_show):\n",
    "        row = best_placements.iloc[index_to_show]\n",
    "        print(f'Green: hits {row[\"merger_inspiration_names\"]}')\n",
    "        print('Cyan: merger')\n",
    "        print(f'Magenta: Enamine Real purchasable {row[\"name\"]}')\n",
    "        display(make_3Dview(pdbblock, {'greenCarbon': row.merger_inspiration_mols,\n",
    "                               'cyanCarbon': [row.merger_minimized_mol],\n",
    "                               'magentaCarbon': [row.enamine_minimized_mol]}))\n",
    "\n",
    "    display(headerify('Step 2. Placement of purchasable similars'))\n",
    "    display(headerify(f'Top {topN_to_pick} Placements'))\n",
    "    display( best_placements.drop(noisy_fields, axis=1) )\n",
    "\n",
    "    print('In the sorted combinations table (`best_combinations`) which index do you want to see:')\n",
    "    scale = widgets.IntSlider(min=0,max=len(best_placements)-1, step=1, value=0)\n",
    "    interact(show3D_placed, index_to_show=scale)\n",
    "elif find_similars:\n",
    "    display(headerify('Purchasable similars'))\n",
    "    display(similars)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "colab_fragmenstein.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}