docs/tutorial/en/machine_learning/qboost.ipynb from OpenJij/OpenJij

docs/tutorial/en/machine_learning/qboost.ipynb
Summary

Maintainability

Test Coverage

Issues
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ensemble Learning with Annealing (QBoost)\n",
    "\n",
    "[QBoost](https://arxiv.org/abs/0811.0416) is one of the ensemble learning with quantum annealing(QA).\n",
    "Ensemble learning is a method that prepares a large number of weak predictors and combines the predictions of each of the predictors to obtain a final accurate prediction result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview and Principle\n",
    "\n",
    "QBoost is an algorithm whose goal is to accurately identify the nature of the input signal $\\boldsymbol{x}$.\n",
    "Let us consider the problem of which of the two values $\\pm 1$ should be assigned to the input signal.\n",
    "As an example, imagine a task where $\\boldsymbol{x}$ represents image data and we want to identify whether what we see in the image is a dog or a cat.\n",
    "In ensemble learning, the goal is to achieve better prediction accuracy (boosting) by using multiple predictors.\n",
    "Here, we have many predictors that do not perform well (weak predictors).\n",
    "Note that by poorly performing, it means that they often do not produce correct outputs for their inputs.\n",
    "Let the output of these predictors be $c_i (\\boldsymbol{x}) \\in \\{ -1, 1\\} \\ (i=0, 1, \\dots, N-1)$.\n",
    "The basic idea is that by summing the outputs of several weak predictors, we can make better predictions.\n",
    "This can be expressed as:\n",
    "\n",
    "$$\n",
    "C(\\boldsymbol{x}) \n",
    "= \\mathrm{sgn} \\left( \\sum_{i=0}^{N-1} w_i c_i (\\boldsymbol{x}) \\right) \n",
    "$$(1)\n",
    "\n",
    "$w_i \\in \\{0, 1\\}$ indicates whether the $i$th predictor is used or not.\n",
    "Let us identify which predictor gives better performance with as few weak predictors as possible.\n",
    "We will use supervised learning to find the optimal $\\{w_i\\}$ pairs.\n",
    "We have a large number of $(\\boldsymbol{x}^{(d)}, y^{(d)}) \\ (d= 0, 1, \\dots, D-1)$ of supervised data ($D \\gg 1$), and we adjust $\\{w_i\\}$ to reproduce them as closely as possible.\n",
    "In other words, we aim to minimize the following Hamiltonian for $\\{w_i\\}$:\n",
    "\n",
    "$$\n",
    "H(\\boldsymbol{w}) = \\sum_{d=0}^{D-1} \\left( \\frac{1}{N} \\sum_{i=0}^{N-1} w_i c_i (\\boldsymbol{x}^{(d)}) - y^{(d)}\\right)^2 + \\lambda \\sum_{i=0}^{N-1} w_i \n",
    "$$ (2)\n",
    "\n",
    "Through this minimization of the Hamiltonian, the difference from the supervised data $y^{(d)}$ is made as small as possible.\n",
    "If we use the right-hand side of equation (1) as it is, it cannot come down to the Ising model because it is not a quadratic form of $w_i$ due to the sign function.\n",
    "Therefore, the problem is made to minimize the square of the difference between the $1/N$ times the sign function argument $\\sum_i w_i c_i$ and the supervised data $y^{(d)}$.\n",
    "The coefficient of $1/N$ is to adjust the maximum value of $\\sum_i w_i c_i(\\boldsymbol{x})$ to be $N$ so that the difference between $y^{(d)}= \\pm 1$ is not too large.\n",
    "The term with $\\lambda (>0)$ applied represents the regularization term to efficiently construct a relatively small number of weak predictors without setting too many $w_i$ to 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling by JijModeling\n",
    "\n",
    "### Variables\n",
    "Let us define the variables used in equation (2) as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import jijmodeling as jm\n",
    "\n",
    "# set problem\n",
    "problem = jm.Problem('QBoost')\n",
    "# define variables\n",
    "c = jm.Placeholder('c', dim=2)\n",
    "N = c.shape[0].set_latex('N')\n",
    "D = c.shape[1].set_latex('D')\n",
    "w = jm.Binary('w', shape=(N))\n",
    "y = jm.Placeholder('y', shape=(D))\n",
    "lam = jm.Placeholder('lam')\n",
    "i = jm.Element('i', (0, N))\n",
    "d = jm.Element('d', (0, D))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`c = jm.Placeholder('c', dim=2)` defines $c$ in equation (2).\n",
    "From the size of that list, we define the number of weak predictors $N$ and the number of supervised data $D$ as `N, D`, respectively.\n",
    "Using them, we define the binary variable `w` and the binary value `y` of the supervised data to be used for optimization.\n",
    "We define $\\lambda$ in equation (2) as `lam` and the subscripts `i, d`, used in equation (2)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Objective Function\n",
    "\n",
    "Let us implement equation (2). We start with the first term."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set objective function 1: minimize the difference\n",
    "sum_i = jm.Sum(i, w[i]*c[i, d]) / N\n",
    "problem += jm.Sum(d, (sum_i-y[d])**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We write $\\sum_{i=0}^{N-1} w_i c_i / N$ as `sum_i`.\n",
    "We also implement the second term as an objective function.\n",
    "Note that `w[:]` implements $\\sum_i w_i$ in a concise way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/latex": [
       "$$\\begin{alignat*}{4}\\text{Problem} & \\text{: QBoost} \\\\\\min & \\quad \\sum_{ d = 0 }^{ D - 1 } \\left( \\frac{ \\sum_{ i = 0 }^{ N - 1 } w_{i} \\cdot c_{i,d} }{ N } - y_{d} \\right) ^ { 2 } + lam \\cdot \\sum_{ \\bar{i}_{0} = 0 }^{ N - 1 } w_{\\bar{i}_{0}} \\\\& w_{i_{0}} \\in \\{0, 1\\}\\end{alignat*}$$"
      ],
      "text/plain": [
       "<jijmodeling.problem.problem.Problem at 0x10a705dc0>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set objective function 2: minimize the number of weak classifiers\n",
    "problem += lam * w[:]\n",
    "problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Instance\n",
    "\n",
    "Let us set up the task to be executed.\n",
    "We use the decision stump (decision stock: one-layer decision tree) from [scikit-learn](https://scikit-learn.org/stable/) as the weak predictor.\n",
    "We also use the scikit-learn cancer identification dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn import datasets\n",
    "from sklearn.tree import DecisionTreeClassifier as DTC\n",
    "\n",
    "def prediction_from_train(N, X_train, y_train, X_test):\n",
    "    # set the number of ensembles to be taken out for one sample in the bootstrap sampling\n",
    "    sample_train = 40\n",
    "    # set model\n",
    "    models = [DTC(splitter=\"random\", max_depth=1) for i in range(N)]\n",
    "    for model in models:\n",
    "        # extract randomly\n",
    "        train_idx = np.random.choice(np.arange(X_train.shape[0]), sample_train)\n",
    "        # make decision tree with variables\n",
    "        model.fit(X=X_train[train_idx], y=y_train[train_idx])\n",
    "    y_pred_list_train = []\n",
    "    for model in models:\n",
    "        # execute prediction with model\n",
    "        y_pred_list_train.append(model.predict(X_train))\n",
    "    y_pred_list_train = np.asanyarray(y_pred_list_train)\n",
    "    y_pred_list_test = []\n",
    "    for model in models:\n",
    "        # execute with test data\n",
    "        y_pred_list_test.append(model.predict(X_test))\n",
    "    y_pred_list_test = np.array(y_pred_list_test)\n",
    "    return y_pred_list_train, y_pred_list_test\n",
    "\n",
    "# load data\n",
    "cancer_data = datasets.load_breast_cancer()\n",
    "# set the number of train data\n",
    "num_train = 200\n",
    "# add noise to feature\n",
    "noisy_data = np.concatenate((cancer_data.data, np.random.rand(cancer_data.data.shape[0], 30)), axis=1)\n",
    "# convert (0, 1) label to (-1, 1)\n",
    "labels = (cancer_data.target-0.5) * 2\n",
    "# divide the dataset to train and test\n",
    "X_train = noisy_data[:num_train, :]\n",
    "X_test = noisy_data[num_train:, :]\n",
    "y_train = labels[:num_train]\n",
    "y_test = labels[num_train:]\n",
    "# set the number of classifiers\n",
    "N = 20\n",
    "# predict from train data using decision tree classifier\n",
    "y_pred_list_train, y_pred_list_test = prediction_from_train(N, X_train, y_train, X_test)\n",
    "# set lambda (coefficient of 2nd term)\n",
    "lam = 3.0\n",
    "instance_data = {'y': y_train, 'c': y_pred_list_train, 'lam': lam, 'y_train': y_train, 'y_test': y_test, 'y_pred_list_test': y_pred_list_test}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For demonstration purposes, we use the data with the addition of noisy features.\n",
    "The `prediction_from_train` function is used to create weak predictors and the output $c_i (\\boldsymbol{x}^{(d)})$ from those predictors.\n",
    "Here the number of weak predictors is 20 and the number of supervised data is 200.\n",
    "The value of $\\lambda$ in equation (2) is set to 3.0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Undetermined Multiplier\n",
    "Since there are no constraints, the dictionary to set the undefined multiplier is left empty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set multipliers\n",
    "multipliers = {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversion to PyQUBO by JijModeling Transpiler\n",
    "JijModeling has executed all the implementations so far. By converting this to [PyQUBO](https://pyqubo.readthedocs.io/en/latest/), it is possible to perform combinatorial optimization calculations using OpenJij and other solvers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from jijmodeling.transpiler.pyqubo import to_pyqubo\n",
    "\n",
    "# convert to pyqubo\n",
    "pyq_model, pyq_cache = to_pyqubo(problem, instance_data, {})\n",
    "qubo, bias = pyq_model.compile().to_qubo(feed_dict=multipliers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The PyQUBO model is created by `to_pyqubo` with the `problem` created by JijModeling and the `instance_data` we set to a value as arguments. Next, we compile it into a QUBO model that can be computed by OpenJij or other solver."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimization by OpenJij\n",
    "This time, we will use OpenJij's simulated annealing to solve the optimization problem. We set the SASampler and input the qubo of the QUBO model into that sampler to get the result of the calculation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openjij as oj\n",
    "\n",
    "# set sampler\n",
    "sampler = oj.SASampler()\n",
    "# solve problem\n",
    "response = sampler.sample_qubo(qubo, num_reads=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Decoding and Displaying the Solution\n",
    "Decode the returned results to facilitate analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# decode solution\n",
    "result = pyq_cache.decode(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the weak predictors selected by simulated annealing, let us look at the classification accuracy of the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy of QBoost is 0.9024390243902439\n"
     ]
    }
   ],
   "source": [
    "from sklearn import metrics\n",
    "\n",
    "y_pred_list_train = instance_data['c']\n",
    "y_pred_list_test = instance_data['y_pred_list_test']\n",
    "y_train = instance_data['y_train']\n",
    "y_test = instance_data['y_test']\n",
    "accs_train_oj = []\n",
    "accs_test_oj = []\n",
    "for solution in result.record.solution['w']:\n",
    "    idx_clf_oj = solution[0][0]  # [0, 1, 4, 5, 6, 7, 8, 11, 13, 14, 15, 16, 17, 18, 19]\n",
    "    y_pred_train_oj = np.sign(np.sum(y_pred_list_train[idx_clf_oj, :], axis=0))\n",
    "    y_pred_test_oj = np.sign(np.sum(y_pred_list_test[idx_clf_oj, :], axis=0))\n",
    "    acc_train_oj = metrics.accuracy_score(y_true=y_train, y_pred=y_pred_train_oj)\n",
    "    acc_test_oj = metrics.accuracy_score(y_true=y_test, y_pred=y_pred_test_oj)\n",
    "    accs_train_oj.append(acc_train_oj)\n",
    "    accs_test_oj.append(acc_test_oj)    \n",
    "print('Accuracy of QBoost is {}'.format(max(accs_test_oj)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also check the average accuracy for the all 20 weak predictors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average accuracy of all 20 weak predictors is 0.8269647696476966\n"
     ]
    }
   ],
   "source": [
    "accs = []\n",
    "for i in range(20):\n",
    "    y_pred_test = np.sign(np.sum(y_pred_list_test[[i], :], axis=0))\n",
    "    acc = metrics.accuracy_score(y_true=y_test, y_pred=y_pred_test)\n",
    "    accs.append(acc)\n",
    "accs_ave = sum(accs) / 20\n",
    "print('Average accuracy of all 20 weak predictors is {}'.format(accs_ave))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The classification accuracy on the test data is better than using all of the weak predictors."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  },
  "vscode": {
   "interpreter": {
    "hash": "2e8d7574d7ec71e14cb1575cf43673432d6fae464c836a7b3733d4f6c20243fb"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}