examples/examples.ipynb from migraf/fhir-kindling

examples/examples.ipynb
Summary

Maintainability

Test Coverage

Issues
{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# FHIR kindling examples & tutorial\n",
    "\n",
    "This notebook contains some examples on how to use the FHIR kindling library. If you don't have access to a development FHIR server instance, you can start a [Hapi JPA Server](https://hapifhir.io/hapi-fhir/docs/server_plain/server_types.html) container by running the following command on in the current working directory (requires docker and docker-compose to be installed and port 8082 to be available).\n",
    "```shell\n",
    "docker-compose up hapi\n",
    "```\n",
    "\n",
    "Make sure the library is installed in the current environment"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "!pip install fhir-kindling"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "The following examples assume that you have a FHIR server running on the localhost:8082. If you wish to connect to a remote server, you can find a more detailed description in the README.\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "from fhir_kindling import FhirServer\n",
    "\n",
    "from fhir_kindling.generators import PatientGenerator"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "UsageError: Line magic function `%` not found.\n"
     ]
    }
   ],
   "source": [
    "% load_ext autoreload\n",
    "% autoreload 2"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Connecting to a FHIR server\n",
    "\n",
    "Connect to a fhir server on a given API endpoint. In this case, we only have to specify the API endpoint as the test API is not secured.\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "source": [
    "fhir_api_url = \"http://localhost:8082/fhir\"\n",
    "\n",
    "server = FhirServer(api_address=fhir_api_url)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": 3,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "## First basic query using the server\n",
    "Attempt to query the first 100 patients from the server."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "resp = server.query(\"Patient\")\n",
    "result = resp.limit(100)\n",
    "result.resources"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "If this is the first time starting using the development server or there are no Patient resources in the server you are connecting to you. There will be no resources present on the server. So lets create some and upload them to the server."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# create 100 randomly generated patients\n",
    "patients = PatientGenerator(n=100).generate()\n",
    "\n",
    "# upload the patients to the server\n",
    "response = server.add_all(patients)\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "now we can query the server for the newly created patients."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "query = server.query(\"Patient\").all()\n",
    "print(f\"Num patients: {len(query.resources)}\")\n",
    "print(query.resources[0])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "there should now be 100 patients in the server.\n",
    "\n",
    "## Dataset generation\n",
    "Generating resource that retain referential integrity is a common difficulty while getting started with FHIR. This library provides a convenient way to generate a dataset of resources that can be uploaded to a FHIR server.\n",
    "More resources and references between resource on the server are also required to be able to showcase more complex queries.\n",
    "\n",
    "### Molecular sequence dataset based on a text file\n",
    "In this example there is we use text file containing molecular sequences to generate [FHIR molecular sequence resources](http://www.hl7.org/fhir/molecularsequence.html). There are example files in the `./hiv_sequences` folder.\n",
    "\n",
    "#### Reading the relevant data from the text file\n",
    "We want to extract the sequence and the variant from one of the example text files."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [],
   "source": [
    "# read all lines from the text file\n",
    "with open(\"./hiv_sequences/sequences_1.txt\", \"r\") as sequence_file:\n",
    "    seq_lines = [line.strip() for line in sequence_file.readlines()]\n",
    "\n",
    "sequences = []\n",
    "variants = []\n",
    "for line in seq_lines:\n",
    "    # split the line on tab chars\n",
    "    _, sequence, variant = line.split(\"\\t\")\n",
    "    sequences.append(sequence)\n",
    "    # split the variants\n",
    "    variants.append(variant.split())"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Creating the FHIR resources\n",
    "The best way to go about creating a ResourceGenerator is by first looking up the resource definition of the resource you want to create on the [FHIR website](http://www.hl7.org/fhir/resourcelist.html).\n",
    "Then you can import the corresponding fhir resource classes from the [fhir.resources library](https://github.com/nazrulworld/fhir.resources).\n",
    "And use them to build a ResourceGenerator.\n",
    "\n",
    "First import the resource and Generator classes.\n",
    "Field Generators can generate values based on weighted choice from a list of values or based on a generator function that returns the matching value for the field.\n",
    "For static values field values can be specified as a list of values."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [],
   "source": [
    "from fhir_kindling.generators import (ResourceGenerator, FieldGenerator, FieldValue, DatasetGenerator,\n",
    "                                      GeneratorParameters)\n",
    "from fhir.resources.molecularsequence import MolecularSequence\n",
    "\n",
    "# create a generator for the MolecularSequence resource\n",
    "mol_seq_generator = ResourceGenerator(MolecularSequence)\n",
    "\n",
    "# turn the found variants and sequences into iterators\n",
    "variant_iter = iter(variants.copy())\n",
    "sequence_iter = iter(sequences.copy())\n",
    "\n",
    "#  write a generator function for the variants found in the text file\n",
    "def generate_variants():\n",
    "    # variants need to be a list according to the FHIR spec\n",
    "    generated_variants = []\n",
    "    # get the next variant from the variants iterator\n",
    "    # print(len(list(variant_iter)))\n",
    "    iter_var = next(variant_iter)\n",
    "\n",
    "    for v in iter_var:\n",
    "        # append a dictionary with the variant information\n",
    "        generated_variants.append(\n",
    "            {\n",
    "                \"observedAllele\": v,\n",
    "            }\n",
    "        )\n",
    "    return generated_variants\n",
    "\n",
    "\n",
    "# initialize a field generator instance for the variants using our generator function\n",
    "variant_generator = FieldGenerator(field=\"variant\", generator_function=generate_variants)\n",
    "\n",
    "# Create a Field generator for the observedSequence. It simply returns the next sequence from the sequence iterator\n",
    "sequence_generator = FieldGenerator(field=\"observedSeq\", generator_function=lambda: next(sequence_iter))\n",
    "\n",
    "# Static value for the coordinate system\n",
    "coordinate_value = FieldValue(field=\"coordinateSystem\", value=0)\n",
    "\n",
    "# Group the generators and values into GeneratorParameters\n",
    "params = GeneratorParameters(\n",
    "    count=len(sequences),\n",
    "    field_values=[coordinate_value],\n",
    "    field_generators=[variant_generator, sequence_generator],\n",
    ")\n",
    "\n",
    "# set the parameters\n",
    "mol_seq_generator.params = params\n",
    "\n",
    "# Generate the list of resources\n",
    "mol_seq = mol_seq_generator.generate()\n",
    "\n",
    "# mol_seg"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Uploading the resources to the server\n",
    "\n",
    "To associate the generated sequences with generated patients we can use the DataSetGenerator."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# reset the iterators\n",
    "variant_iter = iter(variants.copy())\n",
    "sequence_iter = iter(sequences.copy())\n",
    "\n",
    "# create a data set generator instance\n",
    "sequence_ds_generator = DatasetGenerator(n=len(sequences))\n",
    "# add our resource generator to after setting its count to None to let the data set generator\n",
    "# handle the distribution of resources\n",
    "mol_seq_generator.params.count = None\n",
    "sequence_ds_generator = sequence_ds_generator.add_resource(resource_generator=mol_seq_generator)\n",
    "\n",
    "# generate the data set\n",
    "sequence_ds = sequence_ds_generator.generate()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n",
     "is_executing": true
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "This will generate a patient and add a reference to it for each molecular sequence."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# output the data set\n",
    "sequence_ds"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n",
     "is_executing": true
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Now we can add the generated data set to the server."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "resources, reference = sequence_ds.upload(server=server)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n",
     "is_executing": true
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Accessing the generated resources\n",
    "Getting our newly generated resources from the server demonstrates the more complex query functionality of this library.\n",
    "Now we can use the include functionality to query the newly created MolecularSequence resources and include the associated Patient resources\n",
    "that were created by the DataSetGenerator."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num Mol seqs: 168, Num Patients: 168\n"
     ]
    }
   ],
   "source": [
    "mol_query = server.query(\"MolecularSequence\")\n",
    "mol_query = mol_query.include(resource=\"MolecularSequence\", search_param=\"patient\")\n",
    "response = mol_query.all()\n",
    "\n",
    "print(f\"Num Mol seqs: {len(response.resources)}, Num Patients: {len(response.included_resources[0].resources)}\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}