kjappelbaum/pyepal

View on GitHub
examples/quantile_regression.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using quantile regression as uncertainty surrogate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually, Bayesian models such as Gaussian processes are used to determine uncertainty intervalls. But there are many other techniques that can be used as uncertainty surrogates. For example, PyePAL implements [quantile regression](https://en.wikipedia.org/wiki/Quantile_regression) using [Gradient Boosted Decision trees](https://en.wikipedia.org/wiki/Gradient_boosting)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in the other examples, we will use the [Binh-Korn test function](https://en.wikipedia.org/wiki/Test_functions_for_optimization)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyepal.models.gbdt import build_gbdt_tuple\n",
    "from pyepal import PALGBDT\n",
    "from pyepal.pal.utils import exhaust_loop\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt \n",
    "plt.style.use('ggplot')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def binh_korn(x, y):  # pylint:disable=invalid-name\n",
    "    \"\"\"https://en.wikipedia.org/wiki/Test_functions_for_optimization\"\"\"\n",
    "    obj1 = 4 * x ** 2 + 4 * y ** 2\n",
    "    obj2 = (x - 5) ** 2 + (y - 5) ** 2\n",
    "    return -obj1, -obj2\n",
    "\n",
    "def binh_korn_points():\n",
    "    \"\"\"Create a dataset based on the Binh-Korn test function\"\"\"\n",
    "    x = np.linspace(0, 5, 100)  # pylint:disable=invalid-name\n",
    "    y = np.linspace(0, 3, 100)  # pylint:disable=invalid-name\n",
    "    array = np.array([binh_korn(xi, yi) for xi, yi in zip(x, y)])\n",
    "    return np.hstack([x.reshape(-1, 1), y.reshape(-1, 1)]), array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "x, points = binh_korn_points()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we can start by plotting our objective space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0, 0.5, 'objective 2')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(points[:,0], points[:,1])\n",
    "plt.xlabel('objective 1')\n",
    "plt.ylabel('objective 2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyePAL comes with helper functions to build the Gradient Boosted Decision Tree models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "objective_0_models = build_gbdt_tuple(n_estimators=50, num_leaves=10)\n",
    "objective_1_models = build_gbdt_tuple(n_estimators=50, num_leaves=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initializing PyePAL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      " /Users/kevinmaikjablonka/Dropbox (LSMO)/Documents/open_source/PythonPAL/pyepal/pal/validate_inputs.py:117: UserWarning:Only one epsilon value provided,\n",
      "will automatically expand to use the same value in every dimension\n",
      " /Users/kevinmaikjablonka/Dropbox (LSMO)/Documents/open_source/PythonPAL/pyepal/pal/validate_inputs.py:145: UserWarning:No goals provided, will assume that every dimension should be maximized\n"
     ]
    }
   ],
   "source": [
    "palinstance = PALGBDT(x, [objective_0_models, objective_1_models], 2, coef_var_threshold=10, beta_scale=1/50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can handover the work to a loop we need to initialize the `palinstance` with some measurement. Often, a diverse set is the best choice and PyePAL provides an utilities to calculate this set (`get_kmeans_samples`, `get_maxmin_samples`). Here, we will use a greedy sampling of the farthest points in design space, initialized with the mean and using the Euclidean distance as distance metric (these are the defaults)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyepal import get_kmeans_samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "indices = get_kmeans_samples(x, 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": [
    "palinstance.update_train_set(indices, points[indices])\n",
    "palinstance.cross_validation_points = 0 # for performance reasons, we won't perform cross validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Now we can explore the space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To have some more custom control we will write our own loop, but we could also just use `exhaust_loop`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pyepal at iteration 2.         0 Pareto optimal points,         0 discarded points,         100 unclassified points.\n",
      "pyepal at iteration 3.         0 Pareto optimal points,         0 discarded points,         100 unclassified points.\n",
      "pyepal at iteration 4.         1 Pareto optimal points,         0 discarded points,         99 unclassified points.\n",
      "pyepal at iteration 5.         2 Pareto optimal points,         0 discarded points,         98 unclassified points.\n",
      "pyepal at iteration 6.         2 Pareto optimal points,         0 discarded points,         98 unclassified points.\n",
      "pyepal at iteration 7.         3 Pareto optimal points,         0 discarded points,         97 unclassified points.\n",
      "pyepal at iteration 8.         3 Pareto optimal points,         0 discarded points,         97 unclassified points.\n",
      "pyepal at iteration 9.         3 Pareto optimal points,         0 discarded points,         97 unclassified points.\n",
      "pyepal at iteration 10.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 11.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 12.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 13.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 14.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 15.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 16.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 17.         4 Pareto optimal points,         0 discarded points,         96 unclassified points.\n",
      "pyepal at iteration 18.         5 Pareto optimal points,         0 discarded points,         95 unclassified points.\n",
      "pyepal at iteration 19.         6 Pareto optimal points,         0 discarded points,         94 unclassified points.\n",
      "pyepal at iteration 20.         7 Pareto optimal points,         0 discarded points,         93 unclassified points.\n",
      "pyepal at iteration 21.         8 Pareto optimal points,         0 discarded points,         92 unclassified points.\n",
      "pyepal at iteration 22.         9 Pareto optimal points,         0 discarded points,         91 unclassified points.\n",
      "pyepal at iteration 23.         10 Pareto optimal points,         0 discarded points,         90 unclassified points.\n",
      "pyepal at iteration 24.         11 Pareto optimal points,         0 discarded points,         89 unclassified points.\n",
      "pyepal at iteration 25.         12 Pareto optimal points,         0 discarded points,         88 unclassified points.\n",
      "pyepal at iteration 26.         13 Pareto optimal points,         0 discarded points,         87 unclassified points.\n",
      "pyepal at iteration 27.         14 Pareto optimal points,         0 discarded points,         86 unclassified points.\n",
      "pyepal at iteration 28.         15 Pareto optimal points,         0 discarded points,         85 unclassified points.\n",
      "pyepal at iteration 29.         16 Pareto optimal points,         0 discarded points,         84 unclassified points.\n",
      "pyepal at iteration 30.         17 Pareto optimal points,         0 discarded points,         83 unclassified points.\n",
      "pyepal at iteration 31.         18 Pareto optimal points,         0 discarded points,         82 unclassified points.\n",
      "pyepal at iteration 32.         19 Pareto optimal points,         0 discarded points,         81 unclassified points.\n",
      "pyepal at iteration 33.         20 Pareto optimal points,         0 discarded points,         80 unclassified points.\n",
      "pyepal at iteration 34.         21 Pareto optimal points,         0 discarded points,         79 unclassified points.\n",
      "pyepal at iteration 35.         22 Pareto optimal points,         0 discarded points,         78 unclassified points.\n",
      "pyepal at iteration 36.         23 Pareto optimal points,         0 discarded points,         77 unclassified points.\n",
      "pyepal at iteration 37.         24 Pareto optimal points,         0 discarded points,         76 unclassified points.\n",
      "pyepal at iteration 38.         25 Pareto optimal points,         0 discarded points,         75 unclassified points.\n",
      "pyepal at iteration 39.         42 Pareto optimal points,         0 discarded points,         58 unclassified points.\n",
      "pyepal at iteration 40.         43 Pareto optimal points,         0 discarded points,         57 unclassified points.\n",
      "pyepal at iteration 41.         44 Pareto optimal points,         0 discarded points,         56 unclassified points.\n",
      "pyepal at iteration 42.         45 Pareto optimal points,         0 discarded points,         55 unclassified points.\n",
      "pyepal at iteration 43.         46 Pareto optimal points,         0 discarded points,         54 unclassified points.\n",
      "pyepal at iteration 44.         47 Pareto optimal points,         0 discarded points,         53 unclassified points.\n",
      "pyepal at iteration 45.         48 Pareto optimal points,         0 discarded points,         52 unclassified points.\n",
      "pyepal at iteration 46.         49 Pareto optimal points,         0 discarded points,         51 unclassified points.\n",
      "pyepal at iteration 47.         50 Pareto optimal points,         0 discarded points,         50 unclassified points.\n",
      "pyepal at iteration 48.         51 Pareto optimal points,         0 discarded points,         49 unclassified points.\n",
      "pyepal at iteration 49.         52 Pareto optimal points,         0 discarded points,         48 unclassified points.\n",
      "pyepal at iteration 50.         53 Pareto optimal points,         0 discarded points,         47 unclassified points.\n",
      "pyepal at iteration 51.         54 Pareto optimal points,         0 discarded points,         46 unclassified points.\n",
      "Done. No unclassified point left\n",
      "pyepal at iteration 51.         100 Pareto optimal points,         0 discarded points,         0 unclassified points.\n"
     ]
    }
   ],
   "source": [
    "while sum(palinstance.unclassified) > 0: \n",
    "    new_index = palinstance.run_one_step()\n",
    "    print(palinstance) # the string representation of the object will give basic information about the state\n",
    "    # if there is nothing to sample left, run_one_step() will return None\n",
    "    if new_index is not None: \n",
    "        palinstance.update_train_set(new_index, points[new_index])\n",
    "    else: \n",
    "        break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}