examples/notebooks/Mapping_Molecules.ipynb from zincware/MDSuite

examples/notebooks/Mapping_Molecules.ipynb
Summary

Maintainability

Test Coverage

Issues
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a37fc0cd",
   "metadata": {},
   "source": [
    "# Mapping Water Molecules\n",
    "\n",
    "In this notebook we will cover how to map molecules in different ways and look at some of the things we can do with them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a27c7e1c",
   "metadata": {},
   "source": [
    "In this demonstration, we will load data from a GROMACS simulation and therefore, we need to define a set of units and a file reader object to use. For this reason, we have changed the imports a little bit to keep the code to minimum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9dafbf2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import mdsuite as mds\n",
    "import mdsuite.file_io.chemfiles_read\n",
    "from mdsuite.utils import Units\n",
    "\n",
    "from zinchub import DataHub\n",
    "import shutil\n",
    "\n",
    "import h5py as hf\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c16e2e24",
   "metadata": {},
   "source": [
    "### Loading the data\n",
    "\n",
    "In this tutorial we are using 50 ns simulations of 14 water molecules in a continuum fluid performed with GROMACS. We will use pure atomistic naming as well as ligand naming, the topology files for which are contained on DataHub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f98b8829",
   "metadata": {},
   "outputs": [],
   "source": [
    "water = DataHub(url=\"https://github.com/zincware/DataHub/tree/main/Water_14_Gromacs\", tag=\"v0.1.0\")\n",
    "water.get_file('./')\n",
    "file_paths = [\n",
    "        f for f in water.file_raw\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7572ee3f",
   "metadata": {},
   "source": [
    "### Project definition\n",
    "\n",
    "Here we create the project and define some custom units used by GROMACS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "166ac397",
   "metadata": {},
   "outputs": [],
   "source": [
    "project = mds.Project(\"Mapping_Molecules\")\n",
    "\n",
    "gmx_units = Units(\n",
    "        time=1e-12,\n",
    "        length=1e-10,\n",
    "        energy=1.6022e-19,\n",
    "        NkTV2p=1.6021765e6,\n",
    "        boltzmann=8.617343e-5,\n",
    "        temperature=1,\n",
    "        pressure=100000,\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ac6d863",
   "metadata": {},
   "source": [
    "### Mapping molecules with SMILES\n",
    "\n",
    "In this section we take a look at how one can map molecules using SMILES strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8f5f996",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "traj_path = file_paths[2]\n",
    "topol_path = file_paths[0]\n",
    "\n",
    "file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(\n",
    "    traj_file_path=traj_path, topol_file_path=topol_path\n",
    ")\n",
    "\n",
    "water_chemical = project.add_experiment(\n",
    "    name=\"water_chemical\",\n",
    "    timestep=0.002,\n",
    "    temperature=300.0,\n",
    "    units=gmx_units,\n",
    "    simulation_data=file_reader,\n",
    "    update_with_pubchempy = True\n",
    ")\n",
    "water_chemical.sample_rate=5000\n",
    "water_chemical.run.CoordinateUnwrapper()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41e9a007",
   "metadata": {},
   "source": [
    "The first thing we need to do is define the molecule that will be mapped using the in-built MDSuite molecule data-class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba411ff9",
   "metadata": {},
   "outputs": [],
   "source": [
    "chemical_water = mds.Molecule(\n",
    "    name='water',\n",
    "    smiles=\"[H]O[H]\", \n",
    "    cutoff=1.7, \n",
    "    amount=14, \n",
    "    mol_pbc=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb70c81d",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "water_chemical.run.MolecularMap(\n",
    "    molecules=[chemical_water]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eb09781",
   "metadata": {},
   "source": [
    "### Mapping Molecules with a reference dict\n",
    "\n",
    "If you do not have particles with chemical names but you nonetheless wish to construct groups out of particles, this can be achieved by using a reference dict.\n",
    "\n",
    "In this example, we use the ligand naming from GROMACS to construct water molecules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb6f6ee7",
   "metadata": {},
   "outputs": [],
   "source": [
    "traj_path = file_paths[2]\n",
    "topol_path = file_paths[1]\n",
    "\n",
    "file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(\n",
    "    traj_file_path=traj_path, topol_file_path=topol_path\n",
    ")\n",
    "\n",
    "water_ligand = project.add_experiment(\n",
    "    name=\"water_ligand\",\n",
    "    timestep=0.002,\n",
    "    temperature=300.0,\n",
    "    units=gmx_units,\n",
    "    simulation_data=file_reader,\n",
    "    update_with_pubchempy = True\n",
    ")\n",
    "water_ligand.run.CoordinateUnwrapper()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d631fc86",
   "metadata": {},
   "source": [
    "Keep in mind, as the particles are not named from the periodic tables, important properties such as mass will need to be filled in manually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eaa89cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_ligand.species['OW'].mass = [15.999]\n",
    "water_ligand.species['HW1'].mass = [1.00784]\n",
    "water_ligand.species['HW2'].mass = [1.00784]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca81a37b",
   "metadata": {},
   "source": [
    "In this case, the molecule will be defined a little bit differently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f490ae1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "ligand_water = mds.Molecule(\n",
    "    name='water', \n",
    "    cutoff=1.7, \n",
    "    amount=14, \n",
    "    species_dict={\"HW1\": 1, \"OW\": 1, \"HW2\": 1},\n",
    "    mol_pbc=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d07fb41f",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_ligand.run.MolecularMap(\n",
    "    molecules=[ligand_water]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bde229b5",
   "metadata": {},
   "source": [
    "### What information is stored?\n",
    "\n",
    "So the molecule mapping itself was quick and easy, but what information has been stored along the way?\n",
    "\n",
    "All meta-data about the molecules is stored in the experiment class under molecules. Let's take a look at what this contains."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b55224a",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.molecules.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccc0fcfe",
   "metadata": {},
   "source": [
    "This dict will contain all of the molecules that have been mapped, but this is not the information about the molecules, for that, we need to look at the water molecule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdf2091f",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.molecules['water']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "656208df",
   "metadata": {},
   "source": [
    "Three of these are fairly trivial and we can look at them quickly, groups will require some more attention."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d024e00b",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"n_particles: {water_chemical.molecules['water'].n_particles}\")\n",
    "print(f\"mass: {water_chemical.molecules['water'].mass}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15a6a3c2",
   "metadata": {},
   "source": [
    "Now let's take a look at the groups key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71efe836",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(water_chemical.molecules['water'].groups)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1dcdb0ef",
   "metadata": {},
   "source": [
    "The groups key contains direct information about which atoms belong to which molecule, for example, the 10th molecule of water (id=9) consists of Hydrogen atoms 18 and 19 and oxygen atom 10."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a59a2e7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(water_chemical.molecules['water'].groups['9'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2b19cab",
   "metadata": {},
   "source": [
    "With this information you can compute values, e.g. diffusion coefficients with only the atoms belonging to a single molecule using the atom_select arguments in the calculator."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd908e51",
   "metadata": {},
   "source": [
    "### Analysis with molecules\n",
    "\n",
    "Now that we have seen how we can build molecules and what information this gives is, let's look at what we can analyse using them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2275128d",
   "metadata": {},
   "source": [
    "#### Angular Distribution Functions (ADFs)\n",
    "\n",
    "First things first, let's confirm we are working with water by looking at the angular distribution function of the atoms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55c668ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.run.AngularDistributionFunction(\n",
    "    number_of_configurations=5000, number_of_bins=500, norm_power=8\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b206ed9",
   "metadata": {},
   "source": [
    "Looking at the O_H_H ADF in he top right we see a strong max peak at 109.591 degrees corresponding well with the bond angle of an SPCE model (109.47) as was used in the simulation. It is also worth noting that the oxygen triplet angle looks similar to that measured in QM and experimental studies.\n",
    "\n",
    "When we want to study the molecular ADF we have two choices, we can either pass it as a species argument to the calculator if only one is desired, we we can call the calculator with the `molecules=True` keyword as we will do here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6f7e8bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.run.AngularDistributionFunction(\n",
    "    molecules=True, number_of_configurations=3000, number_of_bins=500, norm_power=8\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4407de0",
   "metadata": {},
   "source": [
    "In this case we have increased the norm power to suppress the noise floor and highlight only the most dominant peaks.\n",
    "\n",
    "In the water molecule ADF it we do not see any clear stacking or structure suggesting there is not special organization of the molecules in these simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8581249",
   "metadata": {},
   "source": [
    "#### Radial Distribution Functions (RDFs)\n",
    "\n",
    "Now let's look at the radial structure and distribution of particles in space of both the atomistic system and the molecules. This is where molecule mapping can be very helpful as often we are more interested in the positions of the molecules themselves and not necessarily those of the atoms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "be87957e",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.run.RadialDistributionFunction(\n",
    "    number_of_configurations=4000, start=100, stop=5100, number_of_bins=500\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a13dc35e",
   "metadata": {},
   "source": [
    "In the case of the hydrogen-hydrogen and the oxygen-hydrogen we can see clear peaks where the bond distance is fixed. Using the cursor to hover over the points in the plot we can identify a bond distance between hydrogens of approximately 0.163 nm, in good agreement with experimental values. The oxygen-hydrogen bond sits around 0.09 nm, also in good agreement with experiment values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4358b83",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_ligand.run.RadialDistributionFunction(\n",
    "    number_of_configurations=5200, start=0, stop=5100, number_of_bins=500, molecules=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a020f11",
   "metadata": {},
   "source": [
    "### Diffusion Coefficients\n",
    "\n",
    "Now let's start looking at the diffusion coefficients of the atoms and molecules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ffaf74c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_chemical.run.EinsteinDiffusionCoefficients(data_range=500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "662a10e2",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "water_chemical.run.EinsteinDiffusionCoefficients(molecules=True, data_range=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff0301e3",
   "metadata": {},
   "source": [
    "### Group-wise analysis\n",
    "\n",
    "Say we want to study a specific molecule. We only want the diffusion coefficients, ADFs, and RDFs of the atoms in that one molecule. This can be achieved with the MDSuite atom-selection command and is included here as a demonstration of the flexibility of the software.\n",
    "\n",
    "First things first, let's select a molecule group to study, say the first water molecule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "214b0f33",
   "metadata": {},
   "outputs": [],
   "source": [
    "water_group = water_chemical.molecules['water'].groups['0']\n",
    "print(water_group)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97206db4",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "water_chemical.run.RadialDistributionFunction(atom_selection={'H': [0, 1], 'O': [0]}, number_of_configurations=2517)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6187526",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "water_chemical.run.AngularDistributionFunction(atom_selection=water_group, number_of_configurations=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa8518d6",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "water_chemical.run.EinsteinDiffusionCoefficients(atom_selection=water_group, data_range=2500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bb6d22b",
   "metadata": {},
   "source": [
    "# BMIM-BF4 RTIL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d040986f",
   "metadata": {},
   "outputs": [],
   "source": [
    "bmim_bf4 = DataHub(\n",
    "    url=\"https://github.com/zincware/DataHub/tree/main/Bmim_BF4\", tag=\"v0.1.0\"\n",
    ")\n",
    "bmim_bf4.get_file()\n",
    "bmim_file = bmim_bf4.file_raw"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "758e5287",
   "metadata": {},
   "outputs": [],
   "source": [
    "project.add_experiment(\"bmim_bf4\", simulation_data=bmim_file, update_with_pubchempy = True)\n",
    "project.experiments.bmim_bf4.run.CoordinateUnwrapper()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5096f76c",
   "metadata": {},
   "outputs": [],
   "source": [
    "bmim_molecule = mdsuite.Molecule(\n",
    "            name=\"bmim\",\n",
    "            species_dict={\"C\": 8, \"N\": 2, \"H\": 15},\n",
    "            amount=50,\n",
    "            cutoff=1.9,\n",
    "            reference_configuration_idx=100,\n",
    "        )\n",
    "bf_molecule = mdsuite.Molecule(\n",
    "    name=\"bf4\",\n",
    "    smiles=\"[B-](F)(F)(F)F\",\n",
    "    amount=50,\n",
    "    cutoff=2.4,\n",
    "    reference_configuration_idx=100,\n",
    ")\n",
    "project.experiments[\"bmim_bf4\"].run.MolecularMap(\n",
    "    molecules=[bmim_molecule, bf_molecule]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67a3f2ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "project.experiments.bmim_bf4.run.RadialDistributionFunction(\n",
    "    number_of_configurations=300, number_of_bins=100\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0197fc9",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "project.experiments.bmim_bf4.run.RadialDistributionFunction(\n",
    "    number_of_configurations=500, number_of_bins=100, molecules=True\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}