Lambda-School-Labs/Labs26-StorySquad-DS-TeamB

View on GitHub
notebooks/squad_score_mvp.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Documentation notes on this notebook:\n",
    "\n",
    "##### Purpose\n",
    "- Used as an exploratory notebook to look at both API and human transcriptions, potential Squad Score formulas, and correlations with story rankings and textstat packages\n",
    "- Training data: 167 stories\n",
    "- Generate a pickled scaler from training data for use in production\n",
    "\n",
    "##### Outcomes\n",
    "- Ultimately, API transcriptions used over human transcriptions to generate scaler and formula for production, as it's closest to what will actually be seen in production.\n",
    "- MinMaxScaler was selected over StandardScaler to maintain positive score values\n",
    "- More features are explored in this notebook than were included in the Squad Score formula\n",
    "- Initial Squad Score includes only features generated with Python & Pandas that represent either features used in validated writing complexity metrics OR features requested by stakeholder, AND are versions of the feature that are impacted as minimally as possible by inevitable errors in spelling, handwriting, and transcription. \n",
    "- Weights are initialized at 1 for each feature.\n",
    "- This Squad Score results in a -.63 correlation coefficient with provided stakeholder rankings.\n",
    "   - 1.0 features: story_length, avg_word_len, quotes_num, unique_words_num (-.60 correlation)\n",
    "   - 1.1 features: story_length, avg_word_len, quotes_num, unique_words_num, adj_num (-.63 correlation)\n",
    "\n",
    "##### Future Considerations\n",
    "- Scaler should be re-fit with any newly provided training data\n",
    "- Major room for growth in this formula is to improve the weights to something other than 1s. This choice was made due to lack of labels, and reluctance to create an over-fit formula, but if different techniques are available or labels are provided, the weights should be adjusted accordingly to improve the formula."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob\n",
    "\n",
    "import joblib\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import nltk\n",
    "import seaborn as sns\n",
    "import textstat\n",
    "\n",
    "from sklearn.metrics import mean_absolute_error as mae\n",
    "from sklearn.metrics import mean_squared_error as mse\n",
    "from sklearn.preprocessing import MinMaxScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment to download nltk modules\n",
    "# nltk.download('punkt')\n",
    "# nltk.download('averaged_perceptron_tagger')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import API-transcribed stories\n",
    "path = \"../../api_transcribed_stories.csv\"\n",
    "\n",
    "api_t = pd.read_csv(\n",
    "            path, \n",
    "            usecols=[1,2],\n",
    "            header=0,\n",
    "            names=[\"story_id\", \"transcription\"],\n",
    "            dtype=str\n",
    "            )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compile human-transcribed stories for comparison\n",
    "\n",
    "root = \"../../Stories Dataset/Transcribed Stories/\"\n",
    "\n",
    "transcriptions = []\n",
    "\n",
    "for file in glob.glob(root + '**/**/Story*[3000-5999]*'):\n",
    "    with open((file), 'r') as file:\n",
    "        story_id = file.name[-4:]\n",
    "        transcription = file.read().replace('\\n', ' ')\n",
    "        transcriptions.append((story_id, transcription))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate dataframe of transcriptions\n",
    "cols = ['story_id', 'transcription']\n",
    "human_t = pd.DataFrame(transcriptions, columns=cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calc_metrics(df):\n",
    "    \"\"\" \n",
    "    Cleans transcriptions and adds various metrics as columns.\n",
    "    \n",
    "    Features included:\n",
    "    - Grade level\n",
    "    - Length of story (in characters)\n",
    "    - Average word length (in chars)\n",
    "    - Number of quotation marks\n",
    "    - Number of unique words\n",
    "    - Number of adjectives\n",
    "    - Percentage of complex words\n",
    "    \n",
    "    Note that code generating proportional/ratioed features is commented out, as correlation\n",
    "    to rankings sharply decreased with this change in feature quality.\n",
    "    \"\"\"\n",
    "    # Avoid SettingWithCopy error\n",
    "    df = df.copy()\n",
    "    \n",
    "    # Clean potential transcription errors\n",
    "\n",
    "    # Strip leading or tailing spaces and integers \n",
    "    df[\"transcription\"] = df[\"transcription\"].str.strip().str.strip('/-0123456789')\n",
    "    \n",
    "    # Ensure all commas and periods are followed by a space\n",
    "    df[\"transcription\"] = df[\"transcription\"].str.replace(\".\", \". \").str.replace(\",\", \", \")\n",
    "    \n",
    "    # Remove any instances of multiple spaces\n",
    "    df[\"transcription\"] = df[\"transcription\"].str.split().str.join(\" \")\n",
    "    \n",
    "    \n",
    "    # Add metrics\n",
    "    \n",
    "    # Grade level\n",
    "    df[\"grade_level\"] = df[\"story_id\"].str[0].astype(int)\n",
    "\n",
    "    # Length of story\n",
    "    df[\"story_length\"] = df[\"transcription\"].str.len()\n",
    "    \n",
    "    # Average word length\n",
    "    word_count = (df[\"transcription\"].str.split()).str.len()\n",
    "    df[\"avg_word_len\"] = df[\"story_length\"] / word_count\n",
    "    \n",
    "    # Number of quotation marks\n",
    "    # quote_count = df[\"transcription\"].str.count('\"')\n",
    "    # df[\"quotes_ratio\"] = quote_count / df[\"story_length\"]\n",
    "    df[\"quotes_num\"] = df[\"transcription\"].str.count('\"') \n",
    "    \n",
    "    # Number of unique words, over 2 characters    \n",
    "    def over_two_chars(transcription):\n",
    "        \"\"\"Returns number of unique 2+ char words in transcription.\"\"\"\n",
    "        word_list = transcription.split()\n",
    "        word_set = set()\n",
    "        for x in word_list:\n",
    "            if len(x) > 2: \n",
    "                word_set.add(x)\n",
    "        return len(word_set)\n",
    "    \n",
    "    # unique_words_count = df[\"transcription\"].apply(over_two_chars)\n",
    "    # df[\"unique_word_ratio\"] = unique_words_count / df[\"story_length\"]\n",
    "    df[\"unique_words_num\"] = df[\"transcription\"].apply(over_two_chars)\n",
    "    \n",
    "    # Number of adjectives\n",
    "    def num_adj(transcription):\n",
    "        \"\"\"Returns number of adjectives in transcription.\"\"\"\n",
    "        tokens = nltk.word_tokenize(transcription)\n",
    "        pos = nltk.pos_tag(tokens)\n",
    "        adj_count = 1\n",
    "        for word in pos:\n",
    "            if word[1] == 'JJ':\n",
    "                adj_count += 1\n",
    "        return adj_count\n",
    "    \n",
    "    # adj_count = df[\"transcription\"].apply(num_adj)\n",
    "    # df[\"adj_ratio\"] = adj_count / df[\"story_length\"]\n",
    "    df[\"adj_num\"] = df[\"transcription\"].apply(num_adj)\n",
    "    \n",
    "    # Percentage of complex words, based on textstat's automated function\n",
    "    # Compares to list of 3000 most commonuly used English words\n",
    "    # Number of words in story that are not on that list are considered \"complex\"\n",
    "    # Note: prone to errors, may need to be iterated on to be production-grade\n",
    "    num_complex_words = df[\"transcription\"].apply(textstat.difficult_words)\n",
    "    df[\"percent_complex_words\"] = num_complex_words / word_count * 100\n",
    "    \n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate both API and human transcription metrics\n",
    "human_metrics = calc_metrics(human_t)\n",
    "api_metrics = calc_metrics(api_t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(167, 9)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>transcription</th>\n",
       "      <th>grade_level</th>\n",
       "      <th>story_length</th>\n",
       "      <th>avg_word_len</th>\n",
       "      <th>quotes_num</th>\n",
       "      <th>unique_words_num</th>\n",
       "      <th>adj_num</th>\n",
       "      <th>percent_complex_words</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3132</td>\n",
       "      <td>Page 1 Once there was a little cheatah and the...</td>\n",
       "      <td>3</td>\n",
       "      <td>1378</td>\n",
       "      <td>5.122677</td>\n",
       "      <td>14</td>\n",
       "      <td>136</td>\n",
       "      <td>14</td>\n",
       "      <td>4.460967</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id                                      transcription  grade_level  \\\n",
       "0     3132  Page 1 Once there was a little cheatah and the...            3   \n",
       "\n",
       "   story_length  avg_word_len  quotes_num  unique_words_num  adj_num  \\\n",
       "0          1378      5.122677          14               136       14   \n",
       "\n",
       "   percent_complex_words  \n",
       "0               4.460967  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(human_metrics.shape)\n",
    "human_metrics.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(167, 9)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>transcription</th>\n",
       "      <th>grade_level</th>\n",
       "      <th>story_length</th>\n",
       "      <th>avg_word_len</th>\n",
       "      <th>quotes_num</th>\n",
       "      <th>unique_words_num</th>\n",
       "      <th>adj_num</th>\n",
       "      <th>percent_complex_words</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3132</td>\n",
       "      <td>Page. I 3132 Once there was a little cheatah a...</td>\n",
       "      <td>3</td>\n",
       "      <td>1375</td>\n",
       "      <td>5.092593</td>\n",
       "      <td>6</td>\n",
       "      <td>138</td>\n",
       "      <td>15</td>\n",
       "      <td>4.814815</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id                                      transcription  grade_level  \\\n",
       "0     3132  Page. I 3132 Once there was a little cheatah a...            3   \n",
       "\n",
       "   story_length  avg_word_len  quotes_num  unique_words_num  adj_num  \\\n",
       "0          1375      5.092593           6               138       15   \n",
       "\n",
       "   percent_complex_words  \n",
       "0               4.814815  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(api_metrics.shape)\n",
    "api_metrics.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['friendo',\n",
       " 'dylanto',\n",
       " 'familys',\n",
       " \"lion's\",\n",
       " 'usall',\n",
       " 'parents',\n",
       " 'thier',\n",
       " 'hunting',\n",
       " 'dylans',\n",
       " 'bacle',\n",
       " 'dylan',\n",
       " 'minutes',\n",
       " 'bylans']"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Look at output of complex words feature -- note how many are not real words\n",
    "sample_transcription = api_metrics.iloc[0,1]\n",
    "textstat.difficult_words_list(sample_transcription)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Page. I 3132 Once there was a little cheatah and the cheatah had a best friend lion the cheatah\\'s name was paws and the lion\\'s name was Dylan they always played with each other after they went hunting Dylan went to play with Paws after hunting with pack, and Paws wont to play after hunting with his mom. They always met up near the same rock arth the same lake, they would talk and have water fights here is what the talked about your parents and my pack might have a fight\" said Dylan \"We both don\\'t want that to happen we might have to fight against each other, said Paws After afew minutes it was time for them bacle to their family. Next morning they saw there family having a fight, Dylans family, won and Paus mom was hert. At thier usall he meeting time they talked aboutit. \"My mom got hert, It\\'s not fair\" said Paws I think the best way to fix this is to tell are family we are friendo\" said Dylan \"I think your right said Paws. So after there talk time and water fight they went back to there family and told We have been friends since 3 years ago said Paws and Dylanto their familys. Okay thier parents said I will let you goy swich spot for one day and we will see. to them. Pagea 3132 They were going to fet Paws spend time with Bylans pack and Dylan spend time with Paws mom it went well and everyone was happy and now the familys all went to the lake, The End'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Compare to full transcription\n",
    "# Decision was made to keep compplex_words feature out of Squad Score for \n",
    "# unreliability in the limitations of this product\n",
    "sample_transcription"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change this variable to alter which features are included in scaler and Squad Score\n",
    "scaled_features = [\"story_length\", \"avg_word_len\", \"quotes_num\", \"unique_words_num\", \"adj_num\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MinMaxScaler()"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scaler_h = MinMaxScaler()\n",
    "scaler_h.fit(human_metrics[scaled_features])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['../project/app/utils/complexity/MinMaxScaler.pkl']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scaler_api = MinMaxScaler()\n",
    "scaler_api.fit(api_metrics[scaled_features])\n",
    "\n",
    "# Uncomment line below to pickle scaler for future use\n",
    "joblib.dump(scaler_api, \"../project/app/utils/complexity/MinMaxScaler.pkl\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def scale_row(row, transcription_source):\n",
    "    \"\"\"\n",
    "    Takes one row and scales it according to corresponding scaler.\n",
    "    \n",
    "    This function is integrated into the squad_score function\n",
    "    \n",
    "    row: full row of dataframe or a list\n",
    "    transcription_source (str): \"api\" or \"human\" based on source of transcription\n",
    "    \"\"\"\n",
    "    if transcription_source == \"api\":\n",
    "        scaler = scaler_api\n",
    "    elif transcription_source == \"human\":\n",
    "        scaler = scaler_h\n",
    "    else:\n",
    "        raise ValueError(\"Not a valid transcription source. Valid options: 'api' or 'human'.\")\n",
    "    \n",
    "    return scaler.transform([row[scaled_features]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0.36490589, 0.45999584, 0.13953488, 0.34146341, 0.28888889]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test on API transcription sample\n",
    "test_row = api_metrics.loc[0,scaled_features]\n",
    "scale_row(test_row, \"api\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0.34876246, 0.47520045, 0.16666667, 0.35472973, 0.31707317]])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test on human transcription sample\n",
    "test_row = human_metrics.loc[0,scaled_features]\n",
    "scale_row(test_row, \"human\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "story_length           1378\n",
       "avg_word_len        5.12268\n",
       "quotes_num               14\n",
       "unique_words_num        136\n",
       "adj_num                  14\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check test_row\n",
    "test_row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def squad_score(row, transcription_source):\n",
    "    \"\"\"\n",
    "    Scales, weights, and adds all metrics for a given transcription.\n",
    "    \n",
    "    Scales according to transcription source (see scale_row function).\n",
    "    Weights according to presribed weights used only for Story Squad analysis.\n",
    "    \n",
    "    row: full row of dataframe or a list\n",
    "    transcription_source (str): \"api\" or \"human\" based on source of transcription\n",
    "    \"\"\"\n",
    "    \n",
    "    # Instantiate weights\n",
    "    weights = {\"grade_level\": 0,\n",
    "              \"story_length\": 1,\n",
    "              \"avg_word_len\": 1,\n",
    "              \"quotes_number\": 1,\n",
    "               \"unique_words\": 1,\n",
    "               \"adj_num\": 1,\n",
    "              \"percent_complex_words\": 0}\n",
    "    \n",
    "    # Scale needed metrics\n",
    "    scaled = scale_row(row, transcription_source)[0]\n",
    "    \n",
    "    # Generate scaler to create desired output range (~0-100)\n",
    "    range_scaler = 30\n",
    "    \n",
    "    # Weight values\n",
    "    # gl = weights[\"grade_level\"] * row[2] * range_scaler\n",
    "    sl = weights[\"story_length\"] * scaled[0] * range_scaler\n",
    "    awl = weights[\"avg_word_len\"] * scaled[1] * range_scaler\n",
    "    qn = weights[\"quotes_number\"] * scaled[2] * range_scaler\n",
    "    uw = weights[\"unique_words\"] * scaled[3] * range_scaler\n",
    "    an = weights[\"adj_num\"] * scaled[4] * range_scaler\n",
    "    # pcw = weights[\"percent_complex_words\"] * scaled[5] * range_scaler\n",
    "    \n",
    "    # Add all values\n",
    "    s_score = sl + awl + qn + uw + an\n",
    "    \n",
    "    return s_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate Squad Scores for all transcriptions and generate new DFs\n",
    "h_squad_score = human_metrics[[\"story_id\"]].copy()\n",
    "h_squad_score[\"squad_score\"] = human_metrics.apply(lambda x: squad_score(x, \"human\"), axis=1)\n",
    "\n",
    "api_squad_score = api_metrics[[\"story_id\"]].copy()\n",
    "api_squad_score[\"squad_score\"] = api_metrics.apply(lambda x: squad_score(x, \"api\"), axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3132</td>\n",
       "      <td>47.843668</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id  squad_score\n",
       "0     3132    47.843668"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check head\n",
    "api_squad_score.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>167.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>50.095676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>20.209209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.731707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>35.305735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>47.974608</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>60.715916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>124.558085</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       squad_score\n",
       "count   167.000000\n",
       "mean     50.095676\n",
       "std      20.209209\n",
       "min       0.731707\n",
       "25%      35.305735\n",
       "50%      47.974608\n",
       "75%      60.715916\n",
       "max     124.558085"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "h_squad_score.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>167.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>47.399637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>19.397492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.668999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>33.216693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>43.596373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>58.944506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>119.110260</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       squad_score\n",
       "count   167.000000\n",
       "mean     47.399637\n",
       "std      19.397492\n",
       "min       1.668999\n",
       "25%      33.216693\n",
       "50%      43.596373\n",
       "75%      58.944506\n",
       "max     119.110260"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api_squad_score.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Visualize distribution of Squad Score for human transcriptions\n",
    "ax = sns.distplot(h_squad_score[\"squad_score\"])\n",
    "plt.title(\"Distribution of Squad Scores in Human Transcriptions\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEXCAYAAACkpJNEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dd3hc5ZX48e9R711uqu5GGGNsgyEEltBiqkmAhA5ZNiSbkGw25JdACklYkiybzqawJEAoobc4wQRCMN022MYFd9mWVSzb6r3r/P64V2YsVEb2SHc0cz7Po0czt7xz5s6de+593zvvK6qKMcaY8BPhdQDGGGO8YQnAGGPClCUAY4wJU5YAjDEmTFkCMMaYMGUJwBhjwpQlgGGIyD0i8r0AlZUvIs0iEuk+f01E/i0QZbvlvSgi1weqvBG87p0iUi0i+8f6tUci0NvbSyKyWUTO8DqO8exov9uh8BmEdQIQkRIRaRORJhGpF5F3ROSLInJou6jqF1X1v/ws6+yhllHVUlVNUtWeAMT+AxF5pF/556nqg0db9gjjyAduAYpUddIgy3xbRPa4ya9cRJ4Yyxj9ISJpInK/iOx394cdInKr13ENRlWPVdXXjnR9cewWkS0DzHtNRNrdz6taRJ4VkcnuvD+JyJ0DrHO1u3yz+53q9XnefKRxjiZ/v9sw8Ps+2s8gGIR1AnBdpKrJQAHw38C3gPsC/SIiEhXoMoNEPlCjqgcHmulekVwLnK2qScAi4J9jGJ+/fgkkAccAqcDFQHEgXyDI9oHTgQnANBE5cYD5N7uf1ywgDWf7DEpV/+ye3CQB5wH7+p670w7puwL2UjDEEBRUNWz/gBKcA5PvtJOAXmCu+/xPwJ3u4yzgb0A9UAu8iZNEH3bXaQOagW8ChYACNwKlwBs+06Lc8l4DfgK8CzQCfwEy3HlnAOUDxQssATqBLvf1NviU92/u4wjgu8Be4CDwEJDqzuuL43o3tmrgO0Nsp1R3/Sq3vO+65Z/tvudeN44/DbDub4BfDVH2VOB1oAn4h7v8I8NtA5/PaqX7eVS668b4LHsOsA1ocOe93rd9BojjA+CSIeI81o2vFjgAfNudHgv8Ctjn/v0KiPWNH+ekYr+7n0QAtwK7gBrgSZ/PPA54xJ1eD7wHTBxu3wV+4JbzkLsdNwOLhtn37wf+DDwL/KbfvEP7kfv8y8AH/b8PQ5R92OfmrvN7YDnQ4u43FwDv4+z3ZcAPfJYvZIj90/3c17jrHgB+4TPv48A77vYrA24YIoZD78Xns/q2+3olwNXuvJtwvmudOPv5Xwf4DPzZD27B+S5WAp/zifl8YIv72VUA3xizY+BYvVAw/jFAAnCnlwL/3n+HxzlY3wNEu3+nATJQWT478UNAIhDPwAmgApjrLvMM/h/8ftC3rM/81/gwAfwrzhnsNJwz22eBh/vF9gc3ruOBDuCYQbbTQzjJKdlddwdw42Bx9lv3GpyD5v/DOfuP7Dd/JfAL9wt0uvsl8HcbLAROBqLcuLYCX3PnZbllXeZ+Vv8JdDN4AvgjzoHzc8DMfvOScb60t+AcpJOBxe68O4BVOGfT2TgHn//yib8buMt9f/HAf7jL57rT/g94zF3+C8BfgQQg0n1/KcPtu+6+0I5zIInE2U9XDfGZJOAcPM8HLsU54PkmTt/9KAt41Wff+RNHlgAagFNxEmCcu8xx7vN5OAfyS/zZP9195lr3cRJwsvu4wP3Mr3Q/80xg/hAxHHovPp9V3774LziJYvZg77vfZ+DPfnCHG9f5QCuQ7s6vBE5zH6cDC8bqGGhVQAPbB2QMML0LmAwUqGqXqr6p7qc2hB+oaouqtg0y/2FV/UBVW4DvAZ8J0OXp1ThnRrtVtRm4DbiiXzXED1W1TVU3ABtwvmiHcWO5ArhNVZtUtQT4OU61zrBU9RHgK8Ancc7AD4rIt9yy84ETge+paoeqvoFzAPSLqq5V1VWq2u3G9X84X1xwvmSbVfVpVe3COSMbqpH6KzhnxDcDW0SkWETOc+ddCOxX1Z+raru7HVa7864G7lDVg6paBfyQw7dNL/B99/21AV/EOZstV9UOnIP3Ze7n0oVz0Jqhqj3u+2v0c3O8parL1WlfepgBPksfn8Y5oL4MvIBzULqg3zJ3i0g9zn5RCXzdzzgG8xdVfVtVe91t+JqqbnKfbwQe48PPrs9g+2cXMENEslS1WVVXudOvAl5R1cfc72eNqq4fLIZB4uzbF1/H2Taf8fP9DbcfdLnzu1R1Oc6VxGyfeUUikqKqdaq6zs/XPGqWAAaWg3PW2t9Pcc6qX3Yb0PxpJCwbwfy9OF/GLL+iHNoUtzzfsqOAiT7TfA+IrThnU/1luTH1LyvH30DUqR8+G6cu+YvAf4nIJ90Y69zk51u2X0Rkloj8zW24bQR+zIfbbgo+29ZN1IN+Fu6B5sequhDnIPwk8JSIZAB5OFU2AxloO0/xeV7V72BTADzn3nRQj3PV0oPzuTwMvAQ8LiL7ROR/RCR6mM3Qp/9nGTdEm8P1wJNu4mzHufLsf/fYV1U1TVVzVPVq96B2NA7b9iKyWERWiEiViDTg7Bf99/vB9s8bcdomtonIeyJyoTt9qM/pIzEMYKB9ccpgC/cz3H5Qo6rdPs9938+lOCcse0XkdRE5xc/XPGqWAPpxG8RygLf6z3PP/G5R1Wk4jYRfF5Gz+mYPUuRwVwh5Po/zcc4GqnEuPxN84orEubT0t9x9OAcb37K7cS61R6Lajal/WRUjLAf37OcpYCNOtVclkC4iif3K7jPcNvg9Th3/TFVNwam/FXdeJT7bVkSEw7f1UHH2JZNEnDaKMpyqtIEMtJ33+RbXb/ky4Dz34Nr3F6eqFe72+aGqFgEfw7nyuM6fmP0lIrnAmcA1buLcj1NNdr6IBOLEYzD9t8OjwDIgT1VTcapW5SNrDVSQ6k5VvRKnuuUu4Gl3HyoDpo8ghv4G2hf7Pssj+b7tG2TZw4NSfU9Vl+K8n+dxTj7GhCUAl4ikuGcSj+PUQW8aYJkLRWSGezBpwDlz63VnH2Dwg8RQrhGRIhFJwKkjfNq9jN+BcxZ3gXsW+F2cusk+B4BC31tW+3kM+E8RmSoiSTgHtCf6nYUMy43lSeBHIpIsIgU41QGPDL2mQ0RucN9DsohEuNUqxwKrVXUvTmPeD0UkRkQ+Dlzks/pw2yAZpy67WUTmAP/uM+8F4FgR+bR7JvxVYMDbVN04vyciJ7pxxOHU1dcD23Ea/ieLyNdEJNZ9L4vdVR8Dvisi2e4B9PZhts09ONuywH3dbBFZ6j7+hIgc5ya6RpzE2zt4UUfkWpztOhuY7/7NwmmkvDLArzWUZKBWVdtF5CSc6hu/iMg1IpKtqr04nxE42+nPwNki8hkRiRKRTBGZP8K4+vbF03AS8FPu9OG+3yPdD/reS4x7C22qW1XZSOA/80FZAoC/ikgTztnDd3AagT43yLIzgVdw6u9WAr9T1RXuvJ/g7AD1IvKNEbz+wzgNTPtxGqa+CqCqDcCXcBonK3DOhst91uvbMWtEZKA6w/vdst8A9uA0En5lBHH5+or7+rtxrowedcv3RyPOmXkpzpf1f3Aa2PuusK4CFuNUuX0fp8EZ8GsbfMNdvwmnwfAJn3Wrgctxbu2twfns3h4iTgUewLni2YdzB9EFbh1zk/v8IpzPaSfwCXe9O3GS2EZgE7DOnTaYX+Oc+b7s7ner3PcPToJ6GmebbcVpM3l4iLKOxPU4++1+3z+cxDSWPyL8EnCHuw1uZ2RnvUuAze7vC34NXOFW4ZXiVKXcgrM/rWfotpD+9gN1OJ//n4Evquo2d959OPX09SLy/ADrjnQ/8HUtUOJWY34Rpz1hTPTdwWJMUBCRH+A0gl7jdSwmfIjzi95HVDXX61jGkl0BGGNMmLIEYIwxYcqqgIwxJkzZFYAxxoSpYOqcalhZWVlaWFjodRjGGDOurF27tlpVs/tPH1cJoLCwkDVr1ngdhjHGjCsiMuAv7K0KyBhjwpQlAGOMCVOWAIwxJkxZAjDGmDBlCcAYY8KUJQBjjAlTlgCMMSZM+ZUARGSJiGx3h8n7yChYbh/pT7jzV4tIoTv9HBFZKyKb3P9n+qzzmlvmevdvQqDelDHGmOEN+0Mwd3CK3+L0h14OvCciy1R1i89iN+IMpzZDRK7AGaXnszh9q1+kqvtEZC7OcHe+Qwlerar2yy5jjPGAP78EPgkoVtXdACLyOLAU8E0AS3EGtwZnQIvfiIio6vs+y2wG4kUk1h0M24SwR1eXHnUZVy3OH34hY8wR86cKKIfDB1Mu56MDgh9axh1ysAFnYG1flwLr+h38H3Crf77nDrP4ESJyk4isEZE1VVVHOy61McaYPmPSCCwix+JUC33BZ/LVqnoccJr7d+1A66rqvaq6SFUXZWd/pC8jY4wxR8ifBFAB5Pk8z3WnDbiMOwB3Ks44rIhILvAccJ2q7upbQVUr3P9NOGPMnnRkb8EYY8yR8CcBvAfMFJGpIhIDXIEzqLWvZXw4oPRlwKuqqiKSBrwA3KqqhwbkFpEoEclyH0cDFwIfHN1bMcYYMxLDJgC3Tv9mnDt4tgJPqupmEblDRC52F7sPyBSRYuDrQN+tojcDM4Db+93uGQu8JCIbgfU4VxB/COQbM8YYMzS/xgNQ1eXA8n7Tbvd53A5cPsB6dwJ3DlLsQv/DNMYYE2j2S2BjjAlTlgCMMSZMWQIwxpgwZQnAGGPC1LgaFN6EF+tOwpjRZVcAxhgTpiwBGGNMmLIEYIwxYcoSgDHGhClLAMYYE6YsARhjTJiyBGCMMWHKEoAxxoQpSwDGGBOm7JfAJqioKhX1bfT0KlGREcRFRZCRGMMgQ0YbY46CJQATNNo6e3hmXTlbKhsPm37M5BQ+fUIOibG2uxoTSPaNMkGhvK6Vx94tpaGti3OLJjIlLZ7uHmV/YxsrtlVx96s7uWxhLjMnJHsdqjEhwxKA8dzuqmYeeLuE5Lgobjp9OvkZCYfmFU1JYc6kFJ5YU8YDb5fwqRNyOLEww8NojQkd1ghsPNXV08tz71eQmhDNzWfOOOzg32dKWjxfPmMGMycksWz9PkprWz2I1JjQYwnAeGrF9oPUtHRyyfwcEmIGvyCNiYrgsyfmkRIfxaOr99LU3jWGURoTmiwBGM/sb2znjR1VnJCXxowJScMunxATxTUnF9DW1cPj75XR06tjEKUxocsSgPFEryrPv19BXHQk5x832e/1JqfGc8n8HPZUt7Bi+8FRjNCY0GcJwHhiY3k9pbWtnD938ohv7zwhP515uam8saOK2pbOUYrQmNBnCcB4YuWuGrKSYjkhP+2I1j9v7mQiRFi+qTLAkRkTPiwBmDFXUddGWV0bJ0/LOOJf+KbGR3PG7Gy2VDay82BTgCM0JjxYAjBjbvWeGqIjhRPy0o+qnI/PyCIjMYa/bai0BmFjjoAlADOm2jp72FBez/y8NOJjIo+qrKjICC48bjJVzR2s2l0ToAiNCR+WAMyYWldaR1ePsnhqZkDKmz0pmenZiby2o4rO7t6AlGlMuLAEYMaMqrJ6Tw156fFMSYsPSJkiwllzJtLS0c27JbUBKdOYcGEJwIyZXVUtVDd3cvK0wJz99ynMSmRadiJv7qiiq8euAozxlyUAM2Y2ltcTGxXB3JzUgJd95pwJNHV08+4euwowxl+WAMyY6FVla2UjsyclEx0Z+N1uWlYSU7MSeWOnXQUY4y+/vokiskREtotIsYjcOsD8WBF5wp2/WkQK3enniMhaEdnk/j/TZ52F7vRiEblbbMinkFZa00pLZw9Fk1NG7TXOnDOBpvZu1lhbgDF+GTYBiEgk8FvgPKAIuFJEivotdiNQp6ozgF8Cd7nTq4GLVPU44HrgYZ91fg98Hpjp/i05ivdhgtyWykYiI4RZE0dvQJdpWYkUZCTwVnG1/S7AGD/4cwVwElCsqrtVtRN4HFjab5mlwIPu46eBs0REVPV9Vd3nTt8MxLtXC5OBFFVdpaoKPARcctTvxgQlVWVLZSPTsxOJiz66e/+HIiKcNjOLutaujwwraYz5KH8SQA5Q5vO83J024DKq2g00AP1v9bgUWKeqHe7y5cOUCYCI3CQia0RkTVVVlR/hmmBzoLGD2pZOjp0c+Mbf/uZMTiEjMYa3i6tH/bWMGe/GpBFYRI7FqRb6wkjXVdV7VXWRqi7Kzs4OfHBm1G2pbECAOZNHfzzfCBFOnZ5JaW0rpTUto/56xoxn/iSACiDP53muO23AZUQkCkgFatznucBzwHWqustn+dxhyjQhYktlI3kZCSTHRY/J6y0oSCcuOoK3dln3EMYMxZ8E8B4wU0SmikgMcAWwrN8yy3AaeQEuA15VVRWRNOAF4FZVfbtvYVWtBBpF5GT37p/rgL8c5XsxQaiutZN99e2jevdPf7FRkZxUmMnmigbKbPxgYwY1bAJw6/RvBl4CtgJPqupmEblDRC52F7sPyBSRYuDrQN+tojcDM4DbRWS9+zfBnfcl4I9AMbALeDFQb8oEj61uY2zRlLFLAACnTM9EBB54u2RMX9eY8cSvoZhUdTmwvN+0230etwOXD7DencCdg5S5Bpg7kmDN+LPzQDOZiTFkJcWO6eumxkczNyeVp9aW8Y1PzhpywHljwpX9EtiMmp5eZU9NC9P9GPB9NJwyLZOm9m7+sn7f8AsbE4YsAZhRU1HXSmd3L9OzvUkA+RkJzJmUzEMr9+L83MQY48sSgBk1u6qd2zCnZSV68voiwnWnFLK1spF1pXWexGBMMLMEYEbNroPNTE6NIzHWu/r3pfOnkBwbxUMr93oWgzHByhKAGRVdPb2U1rZ6Vv3TJzE2iksX5rJ8UyXVzR2exmJMsLEEYEbF3ppWunuV6dneVP/4uubkArp6lCfeKxt+YWPCiCUAMyp2VTUTIVCY6X0CmDEhiY9Nz+TR1aXWS6gxPiwBmFGxu6qZ3PQEYkex98+RuGpxPhX1bby50zoUNKaPJQATcI3tXZTXtQVF9U+fc4omkpEYw+PvWjWQMX0sAZiAe3d3LQqeNwD7io2K5LKFubyy9QAHm9q9DseYoGAJwATcO7tqiIoQ8jISvA7lMJ89MY/uXuXpteXDL2xMGLAEYALuvZJa8jISRmXw96MxPTuJxVMzePzdMnqtMdgYSwAmsFo6utlS2UhBkJ3997nypHxKa1tZudvGCjDGEoAJqA1l9fT0KgWZwZkAlsydRGp8NI++W+p1KMZ4zhKACai1e50+d/IzgucOIF9x0ZF8ekEOL2/eT21Lp9fhGOMpSwAmoNbsrWPWxCTiY4Lj/v+BfGZRHl09yvPv2yikJrxZAjAB09urrCutY2FBhtehDOmYySnMy03lyTVl1k20CWs2TJIJmB0Hm2hq72ZRQTod3b1ehwPAo6sHrusvzExk2YZ9/OylHeSkxw9ZxlWL80cjNGM8Z1cAJmDWlDj1/4sK0z2OZHjH56YRFSGs2VvrdSjGeMYSgAmYtXvryEqKJT9IbwH1FR8TydycVDaU19PVExxXK8aMNUsAJmDW7K1lUUE6IuJ1KH5ZWJBOe1cvm/c1eh2KMZ6wBGAC4mBjO2W1beOi+qfP1KxE0hOirRrIhC1LACYg1rj3/y8sGD8JIEKEBQXp7K5qoc5+E2DCkCUAExBrSuqIjYrg2CmpXocyIgvy0xGwQeNNWLIEYAJiXWkdx+emERM1vnap9IQYpmUnsq60jl77TYAJM+Pr22qCUkd3D1v2NTI/P83rUI7IwoJ06lq72FPd4nUoxowpSwDmqG2tbKKzp5f5eeMzARRNTiU2KoJ1e60ayIQXSwDmqK1368/HawKIiYpgXm4aH+xroL2rx+twjBkzlgDMUVtfVs+E5Fgmp8Z5HcoRW1iQTlePsqmiwetQjBkzlgDMUVtfVs/8vLRx8wOwgeSlx5OdFHuoO2tjwoElAHNU6lo6KalpHbcNwH1EhIUF6ZTWtlLV1OF1OMaMCUsA5qisL68Hxm/9v68T8tOIEPtNgAkffiUAEVkiIttFpFhEbh1gfqyIPOHOXy0ihe70TBFZISLNIvKbfuu85pa53v2bEIg3ZMbW+tJ6RGBe7vhPAMlx0cyamMy60jp6bNB4EwaGTQAiEgn8FjgPKAKuFJGifovdCNSp6gzgl8Bd7vR24HvANwYp/mpVne/+HTySN2C8tb6snlkTkkmKDY2hJRYWpNPU3k3xwSavQzFm1PlzBXASUKyqu1W1E3gcWNpvmaXAg+7jp4GzRERUtUVV38JJBCbEqCobyutDovqnz+xJySTGRB7q28iYUOZPAsgBynyel7vTBlxGVbuBBiDTj7IfcKt/vifj+RaSMFVS00p9a9e4bwD2FRURwfy8NLZVNtHS0e11OMaMKi8bga9W1eOA09y/awdaSERuEpE1IrKmqqpqTAM0Q1tfNr5/ADaYhYUZ9Kiyvqze61CMGVX+JIAKIM/nea47bcBlRCQKSAVqhipUVSvc/03AozhVTQMtd6+qLlLVRdnZ2X6Ea8bK+tJ6EmIimTUx2etQAmpSShw5afGs3Vtng8abkOZPAngPmCkiU0UkBrgCWNZvmWXA9e7jy4BXdYhvjohEiUiW+zgauBD4YKTBG2+tL2/guJxUIiNCr/ZuYUE6+xvbqahv8zoUY0bNsAnArdO/GXgJ2Ao8qaqbReQOEbnYXew+IFNEioGvA4duFRWREuAXwA0iUu7eQRQLvCQiG4H1OFcQfwjc2zKjrbO7l637GkOu+qfP8blpREeKNQabkObXvXuquhxY3m/a7T6P24HLB1m3cJBiF/oXoglG2/Y30tnTGxL3/w8kPiaSuVNS2VBWT2tnNwkxoXGbqzG+7JfA5ohscBtIj88bXyOAjcSiwgw6untZvmm/16EYMyosAZgjsqG8gczEGHLS4r0OZdQUZiaQlRTDE++Veh2KMaPCEoA5IhvK6jl+nPcAOhwRYVFBBu+V1FF8sNnrcIwJOEsAZsSaO7oprmpmXm7oVv/0OSE/jagIsasAE5IsAZgR21TegCocH6J3APlKjovmnKKJPLOugs7uXq/DMSagLAGYEdvodgF9fIjeAdTfZ0/Mo7alk5c2W2OwCS2WAMyIbSivJy8jnozEGK9DGROnz8wmPyOBh1ft9ToUYwLKEoAZsQ1lDSF7//9AIiKEa07O5909tWzfb91Em9BhCcCMSHVzBxX1bRwfBg3Avi5fmEdMVASP2FWACSGWAMyIhFv9f5/0xBgunDeZ596voNm6iTYhwhKAGZENZQ1ECMzNCa8rAIBrTy6guaOb59/v3xmuMeOTJQAzIhvK65k5IZnEEBkCciTm56UxNyeFR1bttW6iTUiwBGD8pqpsLG8Iix+ADUREuPbkArbtb2L1nlqvwzHmqFkCMH4rr2ujtqUzLH4ANpiLj88hPSGaP765x+tQjDlqlgCM3/qGSAzVMQD8ER8TybUnF/DPbQfYXWX9A5nxzRKA8duGsnpioiKYPSm0hoAcqWtPKSQ6MoL73rKrADO+WQIwfttQXs/cKSlER4b3bpOdHMun5ufw9Npyals6vQ7HmCMW3t9k47funl42VTSEdf2/r387bSod3b32wzAzrlkCMH7ZebCZ9q7esK7/9zVzYjJnzM7moZUltHf1eB2OMUfEEoDxS98QkOHUB9BwPn/aNKqbO3l6bbnXoRhzRCwBGL9sKK8nJS6KwswEr0MJGh+bnskJ+Wn8bkWxjRVgxiVLAMYv68saQn4IyJESEb529iz2NbTz1Noyr8MxZsQsAZhhtXZ2s+NAk9X/D+D0mVmckJ/Gb1+1qwAz/lgCMMPavK+Rnl4Nux5A/WFXAWY8swRghnWoATgvPPsAGo5dBZjxKvy6dDRDenR16UemLduwj9T4aF7ZctCDiIJf31XA9fe/y2PvlnL9xwq9DskYv9gVgBlWeV0buenxXocR1E6fmcUp0zL51Ss7aGjt8jocY/xiCcAMqbmjm9qWTvLS7fbPoYgI37uwiPq2Lu5+dafX4RjjF0sAZkhlta0A5GVYAhhO0ZQUrjgxjwffKbGeQs24YAnADKmstpUIgZw0qwLyx9fPmU1cdCQ/Xr7V61CMGZYlADOksrpWJqXGERNlu4o/spNjufnMGbyy9SCv76jyOhxjhmTfajOoXlXK69qs/n+EPndqIVOzEvnOc5to6ej2OhxjBuVXAhCRJSKyXUSKReTWAebHisgT7vzVIlLoTs8UkRUi0iwiv+m3zkIR2eSuc7dYHwNB52BTBx3dveRb/f+IxEZF8j+XzaOivo27/r7N63CMGdSwCUBEIoHfAucBRcCVIlLUb7EbgTpVnQH8ErjLnd4OfA/4xgBF/x74PDDT/VtyJG/AjB5rAD5yJxZmcP0phTy0ci+rdtd4HY4xA/LnCuAkoFhVd6tqJ/A4sLTfMkuBB93HTwNniYioaouqvoWTCA4RkclAiqquUlUFHgIuOZo3YgKvrLaV+OhIMhNjvA5lXPrmktnkZyTwrWc20tZpYwaY4ONPAsgBfDs5KXenDbiMqnYDDUDmMGX6dqI+UJnGY6W1reRnJFgPoEcoISaKuy6dx96aVu58YYvX4RjzEUHfFYSI3ATcBJCfn+9xNOGjvauHqqYO5uVa/z8DdY8xEqfNzOLPq0s5sTCDS06w8xwTPPy5AqgA8nye57rTBlxGRKKAVGCois8Kt5yhygRAVe9V1UWquig7O9uPcE0glNe1oVj9fyCcWzSJwswEbnt2EzsONHkdjjGH+JMA3gNmishUEYkBrgCW9VtmGXC9+/gy4FW3bn9AqloJNIrIye7dP9cBfxlx9GbUlNW1ImC3gAZAZIRwxUn5JMZG8cVH1tJst4aaIDFsAnDr9G8GXgK2Ak+q6mYRuUNELnYXuw/IFJFi4OvAoVtFRaQE+AVwg4iU+9xB9CXgj0AxsAt4MTBvyQRCWW0r2cmxxEVHeh1KSEiJi+Z/rzyBkuoW/vOJ9fT0Dnp+ZMyY8asNQFWXA8v7Tbvd53E7cPkg6xYOMn0NMNffQM3YUVVKa1s5ZlKK16GElFOmZ/L9i47l+8s2872/fMCPLplrDezGU0HfCGzGXk1zJ62dPeTbAPABd/3HCqlsaOee15gT+WkAABj0SURBVHcxKSWOr5410+uQTBizBGA+oqSmBYACSwCj4ltLZnOwqZ1f/GMHWUmxXLXY/7vbjvaOJGBEr2dCmyUA8xF7a1pJiIkkOynW61BCkohw16XzqG3p5NvPbQLsoGy8YZ3BmY8oqWmhwH4ANqqiIyO455qFnDlnAt9+bhMPvlPidUgmDFkCMIdp7uimpqWTgsxEr0MJeXHRkdxzzULOLZrI95dt5t43dnkdkgkzlgDMYfa69f+FVv8/JmKiIvjt1Qu4YN5kfrx8G3f8dQu9douoGSPWBmAOs7emlagIYYqNADZmoiMjuPuKE8hOiuX+t/dwoLGdn3/mePsNhhl1lgDMYfbWtJCbHk9UpF0cjqXICOH7FxWRkxbPj5Zv5WBTO/deu4h064nVjCL7lptD2jp7qKhvs/p/j4gInz99Gv975QlsKG/gU7972waXN6PKEoA5ZH1ZPb1q9/977aLjp/DY5xfT2N7Np373Dit32YAyZnRYAjCHrCmpBaAgw64AvLawIIPnv3QqWUkxXHf/apZt2Od1SCYEWRuAOWTN3jompsQSH2ONj6PhSH7Fe9VJBTy8qoSvPvY+f/9gPx+fkTUKkZlwZVcABoCeXmXd3jqr/w8y8TGRfO7UqRw7JYXlmyp5cVMlQ/S0bsyIWAIwAGzZ10hTRzeFlgCCTnRkBFeelM/J0zJ4s7iaFywJmACxKiADwMrd1QBMy7YEEIwiRLho3hQiRHhnVw0CnH/cZOuuwxwVSwAGgHd21TA9O5GUuGivQzGDEBEuOG4yqvD2rhoiRFgyd5IlAXPELAEYunp6eXdPLZcuyB1+YeMpEeHCeZNRlDeLq0lNiOZj061h2BwZawMwbCxvoLWzh49Nz/Q6FOMHJwlMoWhyCi9srLSB5s0RswRgWLnLqf9fPM0SwHgRIcLli3KZlBrHY++WcqCx3euQzDhkCcDwzq4ajpmcQob1OzOuxEZFcu3JBURHRvDQyhLaOnu8DsmMM5YAwlx7Vw9r99ZZ9c84lZYQwzUnF9DQ1sXz6yvs9lAzIpYAwtz7pfV0dPdyilX/jFv5GQmcdcxENlU0sL6s3utwzDhiCSDMrdxdQ4TASdMyvA7FHIV/mZVNQWYCyzbso66l0+twzDhhCSDMrdxVzXG5aXb//zgXIcJnFuYB8OTaMnqtKsj4wRJAGGvt7GZ9Wb1V/4SI9MQYLj5+CntrWlm127qQNsOzBBDGVu6qoatHrYfJEDI/L42ZE5L4x5YDNLZ3eR2OCXKWAMLYiu0HSYyJ5MSp6V6HYgJERLjo+Cn09Covbqr0OhwT5CwBhClVZcW2Kk6dkUVslPX/H0qykmI5fVY2G8obKD5oQ0qawVkCCFPFB5upqG/jE3MmeB2KGQX/MiubjMQYlm3YR3dPr9fhmCBlCSBMrdh+EIAzZmd7HIkZDdGREVx8/BSqmztYaQ3CZhCWAMLUim1VzJmUzOTUeK9DMaNk1sRkZk1MYsX2g7R2dnsdjglClgDCUFN7F++V1Fr1TxhYMncyHV29rNh20OtQTBCyBBCG3tpZTXev8onZlgBC3aSUOBYWpLNqdy01zR1eh2OCjF8JQESWiMh2ESkWkVsHmB8rIk+481eLSKHPvNvc6dtF5JM+00tEZJOIrBeRNYF4M8Y/K7YfJCUuigX5aV6HYsbA2UUTiYiAl7cc8DoUE2SGTQAiEgn8FjgPKAKuFJGifovdCNSp6gzgl8Bd7rpFwBXAscAS4HdueX0+oarzVXXRUb8T4xdVZcX2Kk6flU1UpF0AhoOUuGhOm5nNpooGSmtbvQ7HBBF/jgAnAcWqultVO4HHgaX9llkKPOg+fho4S5yBSpcCj6tqh6ruAYrd8oxHPqhopKqpgzOs+iesnDYzi8TYKF7avN+6jDaH+JMAcoAyn+fl7rQBl1HVbqAByBxmXQVeFpG1InLTYC8uIjeJyBoRWVNVVeVHuGYoL2yqJCpCOPsYSwDhJDYqkk/MzmZPdQtv7qz2OhwTJLysA/i4qi7AqVr6soicPtBCqnqvqi5S1UXZ2XbP+tFQVf62cR8fn5lFWoKN/hVuTirMIC0hmp++tN2uAgzgXwKoAPJ8nue60wZcRkSigFSgZqh1VbXv/0HgOaxqaNRtLG+gvK6NC46b7HUoxgNRkRGcPccZOObFD/Z7HY4JAv4kgPeAmSIyVURicBp1l/VbZhlwvfv4MuBVdU4xlgFXuHcJTQVmAu+KSKKIJAOISCJwLvDB0b8dM5QXNlUSHSmcWzTJ61CMR+bnO72F/uzl7dZFhBk+Abh1+jcDLwFbgSdVdbOI3CEiF7uL3Qdkikgx8HXgVnfdzcCTwBbg78CXVbUHmAi8JSIbgHeBF1T174F9a8aXqvLCxkpOm5lNaoIN/hKuIkS45dzZ7K5q4Zl15V6HYzwW5c9CqrocWN5v2u0+j9uBywdZ90fAj/pN2w0cP9JgzZF7v6yeivo2bjl3ltehGI998tiJzM9L41ev7GTp/Bzioq032HBlN4KHiRc2VhITGcHZRRO9DsV4TET41pI5VDa08+A7JV6HYzxkCSAM9PYqyzdVcvqsbBv71wBwyvRMzpidze9e20VDm40cFq4sAYSBNXvrqGxo58J5dveP+dA3PzmHxvYu7nl9l9ehGI9YAggDj79bSnJsFOdY9Y/xUTQlhaXHT+GBt/ewv6Hd63CMBywBhLi6lk7+tqmSTy3IITHWrzZ/E0ZuOXc2Pb3Kz1/e7nUoxgOWAELcM+vK6ezu5arF+V6HYoJQXkYC/3rqVJ5aW86GsnqvwzFjzBJACFNVHl1dysKCdOZMSvE6HBOkbj5zBllJsfzwr5uti4gwYwkghK3cXcPu6hautrN/M4TkuGi+uWQ260rr+cv6fV6HY8aQJYAQ9ufVpaQlRHO+9f1jhnHZglzm5abykxe30tJh4weHC0sAIaqqqYOXPtjPpQty7ZeeZlgREcL3LzqWA40d/O+rxV6HY8aIJYAQ9ad39tCjao2/xm8LC9L5zKJc7n1jlzUIhwlLACGotqWTP71dwvnHTWZ6dpLX4Zhx5DsXFDEhOY5vPLWB9q4er8Mxo8wSQAi6943dtHb18LWzZnodihlnUuOj+cmlx7HzYDN3/3On1+GYUWYJIMRUN3fw0MoSLpo3hZkTk70Ox4xDn5g9gcsX5nLP61YVFOosAYSYe9/YTXtXD1+1s39zFL57YRETU+K4+bF11LV0eh2OGSWWAEJIVZNz9r90fg4zJljdvzlyqfHR/PbqBRxo6ODmx9bZ6GEhyhJACLnr79vo6lG+cuYMr0MxIWBBfjo/+tRc3i6u4UfLt3odjhkF1jtYiHh9RxVPry3ny5+YzjS788cEyOWL8tha2cT9b+9h9sRkrjjJbisOJZYAQkBTexe3PbORGROS+MqZVvdvAuvb58+huKqZ257bRK9ivy0JIZYAQsBdf99GZWM7Xzh9Os+uq/A6HBNioiIjuPfahXzpz+v49nObaO7o4qbTp3sdlgkAawMY594uruaRVaXceOpU8jMSvA7HhKi46EjuuWYhF8ybzI+Xb+MnL26lyxqGxz1LAOPYzgNN/Psja5mencgt5872OhwT4mKiIrj7ihO4anE+//f6bi79/TsUH2z2OixzFCwBjFMHGtu54YH3iI2O5E+fO4n4GOvwzYy+yAjhx586jt9dvYCy2lYuuPtN7nl9F62d1oPoeGQJYBxqau/i+vvfpb61kwduOJE8q/oxY+z84ybz0tdO5+MzsvjvF7fx8btWcPc/d9LQ2uV1aGYErBF4nCmva+ULD6+l+GAz999wInNzUr0OyYSpCSlx3HfDiawpqeV3r+3iF//YwW9eLebUGZmcUzSJM+dMYFJq3JBlPLq69KjjsLuSjpwlgHHknV3V3Pzo+3R193LvdQs5fVa21yEZw6LCDO6/IYMt+xp5Zl05L2/Zz4rnNgEwITmWY6ekcMzkFPIyEshJi2dKWhyZibGkxkd7HLmxBDAOtHX28Ic3d/Prf+6kMDOBe69bZN08m6BTNCWFoilFfPeCY9h+oIm3dlazZV8jm/c18sbOanp6Dx9vOEKcu4vioyOJj4k89DghxvlLjI0iKTaKpLgoUuOiSY2PJirSaq0DyRJAEOvtVZ59v4KfvbSd/Y3tXDBvMv/96eNIjrMzJxO8RIQ5k1KYMynl0LTunl4ONHVQUddGZUMbtS2d1LZ0snpPLe1dPbR19tDW1UNdSydt7vOBhqdPio0iPSGajMQYMhJjyUqKoWhKCtOyE0mx78WIWQIIQpUNbTy7roKn15azp7qF43NT+fUV81k8LdPr0Iw5IlGREeSkxZOTFn/Y9MHaAHpVae3sobmjm+b2bhraumho66S+tYva1k721raysbwBBZ5aWw5AVlIs07ITmZ6dyLSsJKZlJzItO4m89Hi7chiEqA6UZ4PTokWLdM2aNV6HEXA9vcrPX95O8cFmdh5spqS6BQWmZiWyeGoGc3NSiRDxOkxjgkp3Ty+1LZ1UN3dQ1ez8r27qoKq5g9bOD0czixBIT4ghKymWzKQYMhJjyEx0HqcnxBAZISHfkCwia1V1Uf/pdgUwxnp6lZKaFrZVNrGxop6NZQ1sqmigucO5j3pyahxnzJ7Agvw0MpNiPY7WmOAVFRnBhJQ4JqR89E6j1o7uQ4mhprmD6hbn/57qFjp9fsEcIU7X1y9+UElhZiJTsxKZmp3I9KwkctLjiYwI7RMvSwCjpKdXKattZVeVc1a/40ATOw80s/NgE+1dzg4YHSkcMzmFS06YQme3Mj070er3jQmAhNgo8mOjyM9MPGy6qtLc0U1tSyc1zZ3UtHRS09JBQ1sXz79fQVPHhz9oi4mKYGpmIjMmJDE9O5HpE5KYnu38hcoPL/1KACKyBPg1EAn8UVX/u9/8WOAhYCFQA3xWVUvcebcBNwI9wFdV9SV/yhwPenqVyoY2ymrbKK1tYU91KyXVLexx/3zPNCamxDJzQjLXLC5gzuQU5kxKZubEJGKjnB0pEPdDG2OGJiIkx0WTHBdNgU9yuGpxPqpKTUsne6pb2F3VzO6qFooPNrN5XwMvflCJ701MOWnxThtDViKFWYkUZCaQn5FAbnoCcdHjJzkMmwBEJBL4LXAOUA68JyLLVHWLz2I3AnWqOkNErgDuAj4rIkXAFcCxwBTgFRGZ5a4zXJljprunl45u56+1s5u2zh5aOntobOuisb3LaXhyLyGrmzvZ39jO/oZ2DjS20+2zV0RHCnkZCUzNTOSM2dnO2cKERGZMSLZ7no0JciJCVlIsWUmxnFiYcdi8ju4eSqqdK/pdB5sprmpmT3ULz6yrOFR92ycrKYZJqXFMSoknO9m5UykzMYb0xBhS4qJJiY8iMTaKhOgoEmKd219jIiOIjhRkjNv6/LkCOAkoVtXdACLyOLAU8D1YLwV+4D5+GviNOO9kKfC4qnYAe0Sk2C0PP8oMmCvvXcWuqmZ61bm7oKdX6e7ppbtX6erppdfPdvCUuCiykmKZlBrH4mkZTE6NIzc9gbz0BPIynDsc7G4DY0JPbFQksyclM3tS8mHTVZWq5g7KalvdmoBWKhva2N/QTnldK+vL6qht6fTrGCMC0ZERREWI8xcZQYRAhAgRIrz2/84I+NWFPwkgByjzeV4OLB5sGVXtFpEGINOdvqrfujnu4+HKBEBEbgJucp82i8h2P2L2lQVUj3CdsRKssVlcIxOscUHwxhZUcV394cOgistHVvx3jiqugoEmBn0jsKreC9x7pOuLyJqBbn8KBsEam8U1MsEaFwRvbBbXyIxWXP7UV1QAeT7Pc91pAy4jIlFAKk5j8GDr+lOmMcaYUeRPAngPmCkiU0UkBqdRd1m/ZZYB17uPLwNeVecXZsuAK0QkVkSmAjOBd/0s0xhjzCgatgrIrdO/GXgJ55bN+1V1s4jcAaxR1WXAfcDDbiNvLc4BHXe5J3Ead7uBL6tqD8BAZQb+7QFHUX00BoI1NotrZII1Lgje2CyukRmVuMZVVxDGGGMCx+5ZNMaYMGUJwBhjwlTIJgAR+amIbBORjSLynIik+cy7TUSKRWS7iHzSg9iWuK9dLCK3jvXr+8SRJyIrRGSLiGwWkf9wp2eIyD9EZKf7P92j+CJF5H0R+Zv7fKqIrHa32xPuDQRexJUmIk+7+9dWETklGLaZiPyn+zl+ICKPiUicV9tMRO4XkYMi8oHPtAG3kTjudmPcKCILxjguz48VA8XlM+8WEVERyXKfB2x7hWwCAP4BzFXVecAO4DYAObx7iiXA78Tp7mJMyIdda5wHFAFXujF5oRu4RVWLgJOBL7ux3Ar8U1VnAv90n3vhP4CtPs/vAn6pqjOAOpwuSLzwa+DvqjoHOB4nRk+3mYjkAF8FFqnqXJybK/q6ZfFim/0J5/vla7BtdB7OHYIzcX70+fsxjisYjhUDxYWI5AHnAr6dhQVse4VsAlDVl1W1r5OOVTi/NQCf7ilUdQ/g2z3FWDjUtYaqdgJ93WCMOVWtVNV17uMmnANZjhvPg+5iDwKXjHVsIpILXAD80X0uwJk4XY14GVcqcDrOnW+oaqeq1hME2wznrr5497c4CUAlHm0zVX0D545AX4Nto6XAQ+pYBaSJyOSxiisYjhWDbC+AXwLfhMMGSAvY9grZBNDPvwIvuo8H6toi5yNrjB6vX39AIlIInACsBiaqaqU7az8w0YOQfoWz4/d1qZoJ1Pt8Ub3ablOBKuABt3rqjyKSiMfbTFUrgJ/hnClWAg3AWoJjm/UZbBsF03ciaI4VIrIUqFDVDf1mBSyucZ0AROQVt76z/99Sn2W+g1PV8WfvIg1uIpIEPAN8TVUbfee5P+gb03uFReRC4KCqrh3L1/VTFLAA+L2qngC00K+6x6Ntlo5zZjgVp+fdRAaoUggWXmyj4QTTsUJEEoBvA7eP5usEfV9AQ1HVs4eaLyI3ABcCZ+mHP3jwuhsKr1//MCISjXPw/7OqPutOPiAik1W10r20PDjGYZ0KXCwi5wNxQApOvXuaiES5Z7RebbdyoFxVV7vPn8ZJAF5vs7OBPapaBSAiz+Jsx2DYZn0G20aefyeC8FgxHSeZb3BqP8kF1onISYGMa1xfAQxFnAFnvglcrKqtPrMG655irARNNxhuvfp9wFZV/YXPLN+uPa4H/jKWcanqbaqaq6qFONvnVVW9GliB09WIJ3G5se0HykRktjvpLJxfunu6zXCqfk4WkQT3c+2Ly/Nt5mOwbbQMuM69u+VkoMGnqmjUBeOxQlU3qeoEVS10vwflwAJ3/wvc9lLVkPzDabApA9a7f/f4zPsOsAvYDpznQWzn49xtsAv4jofb6OM4l+EbfbbT+Tj17f8EdgKvABkexngG8Df38TScL2Ax8BQQ61FM84E17nZ7HkgPhm0G/BDYBnwAPAzEerXNgMdw2iK63IPXjYNtI0Bw7ozbBWzCuZNpLOPy/FgxUFz95pcAWYHeXtYVhDHGhKmQrQIyxhgzNEsAxhgTpiwBGGNMmLIEYIwxYcoSgDHGhClLAMYYE6YsARhzhETkBhH5jddxGHOkLAEYM06MZbflJjxYAjAhRUQSReQFEdngdgz4WXEG4NkmIuvcgTT6Bpj5gYh8w2fdD9xeURGR50VkrTgDrNzks8znRGSHiLyL09fOULFc7pa5QUTecKdFisjP3OkbReQr7vSz3N5FN7mDg8S600tE5C4RWQdcLiLnishK97085XbkZ8wRGdedwRkzgCXAPlW9AA713/8BTr/4xcATfpbzr6paKyLxwHsi8gwQg9PdwkKc7pZXAO8PUcbtwCdVtUI+HGXqJqAQmK+q3eKMkhWHMyDIWaq6Q0QeAv4dp0tsgBpVXSDOiFDPAmeraouIfAv4OnCHn+/JmMPYFYAJNZuAc9yz5tNwelTco6o71en35BE/y/mqiGzAGSAkD6cjsMXAa6papc5gPsMlk7eBP4nI53FG6AKn187/U7ePflWtBWa7Me5wl3kQZ9CZPn2vczLOKHJvi8h6nA7VCvx8P8Z8hF0BmJDinkEvwOnU7k6czscG083hJ0FxACJyBs6B+hRVbRWR1/rmjTCWL4rIYpyRzdaKyMKRluFqcf8L8A9VvfIIyzHmMHYFYEKKiEwBWlX1EeCnwMeAQhGZ7i7ie/AswRncBTdpTHWnpwJ17sF/Ds6ZNzijpf2LiGS64yhcPkws01V1tarejjOKWB7O+LNfEGfYRkQkA6enyUIRmeGuei3w+gBFrgJO7VvObe+YNexGMWYQdgVgQs1xwE9FpBena91/B7KAF0SkFXgTSHaXfQanX/XNOAf3viqYvwNfFJGtOAfnVeCMoSwiPwBWAvU4XQcP5aciMhPnzP2fwAac9ohZwEYR6QL+oKq/EZHPAU+5ieE94J7+halqlTtwyWN9jcTAd33iNmZErDtoE1bc6p1vqOqFXsdijNesCsgYY8KUXQEYc5TEGUy8f3vAU6r6Iy/iMcZflgCMMSZMWRWQMcaEKUsAxhgTpiwBGGNMmLIEYIwxYer/A24R6I0WH0UNAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Visualize distribution of Squad Score for API transcriptions\n",
    "# Note that API distribution is similar to human distribution\n",
    "ax = sns.distplot(api_squad_score[\"squad_score\"])\n",
    "plt.title(\"Distribution of Squad Scores in API Transcriptions\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Visualize distribtutions for all potential features of API\n",
    "\n",
    "def feature_distplots(df):\n",
    "    \"\"\"Outputs distplots of all features in provided df.\"\"\"\n",
    "    for col, data in df.iteritems():\n",
    "        ax = sns.distplot(df[col])\n",
    "        plt.title(f\"Distribution of {col}\")\n",
    "        plt.show()\n",
    "\n",
    "feature_distplots(api_metrics.iloc[:,2:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create dictionary of hand rankings of 25 stories, provided by stakeholder\n",
    "rankings = {\"3130\": 1,  # key == Story ID, value == rank\n",
    "             \"3112\": 2,\n",
    "             \"5109\": 3,\n",
    "             \"3118\": 4,\n",
    "             \"3122\": 5,\n",
    "             \"3108\": 6,\n",
    "             \"5108\": 7,\n",
    "             \"3121\": 8,\n",
    "             \"3101\": 9,\n",
    "             \"3128\": 10,\n",
    "             \"3106\": 11,\n",
    "             \"3126\": 12,\n",
    "             \"3129\": 13,\n",
    "             \"3105\": 14,\n",
    "             \"5104\": 15,\n",
    "             \"5101\": 16,\n",
    "             \"5103\": 17,\n",
    "             \"3125\": 18,\n",
    "             \"3111\": 19,\n",
    "             \"3119\": 20,\n",
    "             \"3110\": 21,\n",
    "             \"3117\": 22,\n",
    "             \"3127\": 23,\n",
    "             \"3131\": 24,\n",
    "             \"3120\": 25} "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Subset dataframe for ranked human transcriptions\n",
    "ranked_stories_h = human_metrics[human_metrics[\"story_id\"].isin(rankings.keys())].reset_index(drop=True)\n",
    "# Add rankings\n",
    "ranked_stories_h[\"ranking\"] = ranked_stories_h[\"story_id\"].apply(lambda x: rankings[x])\n",
    "\n",
    "# Subset dataframe for ranked API transcriptions\n",
    "ranked_stories_api = api_metrics[api_metrics[\"story_id\"].isin(rankings.keys())].reset_index(drop=True)\n",
    "# Add rankings\n",
    "ranked_stories_api[\"ranking\"] = ranked_stories_api[\"story_id\"].apply(lambda x: rankings[x])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add Squad Score based on squad_score function\n",
    "ranked_stories_h[\"squad_score\"] = ranked_stories_h.iloc[:,:-1].apply(lambda x: squad_score(x, \"human\"), axis=1)\n",
    "# Add the rank based on the squad score\n",
    "ranked_stories_h[\"squad_score_rank\"] = ranked_stories_h[\"squad_score\"].rank(ascending=False)\n",
    "\n",
    "# Repeat for API transcriptions\n",
    "ranked_stories_api[\"squad_score\"] = ranked_stories_api.iloc[:,:-1].apply(lambda x: squad_score(x, \"api\"), axis=1)\n",
    "ranked_stories_api[\"squad_score_rank\"] = ranked_stories_api[\"squad_score\"].rank(ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>ranking</th>\n",
       "      <th>squad_score_rank</th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3117</td>\n",
       "      <td>22</td>\n",
       "      <td>25.0</td>\n",
       "      <td>16.196703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3105</td>\n",
       "      <td>14</td>\n",
       "      <td>12.0</td>\n",
       "      <td>37.525256</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3129</td>\n",
       "      <td>13</td>\n",
       "      <td>7.0</td>\n",
       "      <td>47.511298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3111</td>\n",
       "      <td>19</td>\n",
       "      <td>14.0</td>\n",
       "      <td>32.073730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3118</td>\n",
       "      <td>4</td>\n",
       "      <td>13.0</td>\n",
       "      <td>35.631001</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id  ranking  squad_score_rank  squad_score\n",
       "0     3117       22              25.0    16.196703\n",
       "1     3105       14              12.0    37.525256\n",
       "2     3129       13               7.0    47.511298\n",
       "3     3111       19              14.0    32.073730\n",
       "4     3118        4              13.0    35.631001"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Look at ranking outcome for human transcriptions\n",
    "ranked_stories_h[[\"story_id\", \"ranking\", \"squad_score_rank\", \"squad_score\"]].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>ranking</th>\n",
       "      <th>squad_score_rank</th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3117</td>\n",
       "      <td>22</td>\n",
       "      <td>25.0</td>\n",
       "      <td>17.120997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3105</td>\n",
       "      <td>14</td>\n",
       "      <td>11.0</td>\n",
       "      <td>34.042006</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3129</td>\n",
       "      <td>13</td>\n",
       "      <td>5.0</td>\n",
       "      <td>43.835181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3111</td>\n",
       "      <td>19</td>\n",
       "      <td>14.0</td>\n",
       "      <td>31.448316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3118</td>\n",
       "      <td>4</td>\n",
       "      <td>6.0</td>\n",
       "      <td>40.267507</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id  ranking  squad_score_rank  squad_score\n",
       "0     3117       22              25.0    17.120997\n",
       "1     3105       14              11.0    34.042006\n",
       "2     3129       13               5.0    43.835181\n",
       "3     3111       19              14.0    31.448316\n",
       "4     3118        4               6.0    40.267507"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Look at ranking outcome for API transcriptions\n",
    "ranked_stories_api[[\"story_id\", \"ranking\", \"squad_score_rank\", \"squad_score\"]].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Human transcriptions:\n",
      "MAE: 4.8 rankings\n",
      "MSE: 38.0\n",
      "\n",
      "API transcriptions:\n",
      "MAE: 4.96 rankings\n",
      "MSE: 39.68\n"
     ]
    }
   ],
   "source": [
    "# Calculate MAE & MSE for difference between \n",
    "# stakeholder ranking and ranking generated by Squad Score\n",
    "y_true_h = ranked_stories_h[\"ranking\"]\n",
    "y_pred_h = ranked_stories_h[\"squad_score_rank\"]\n",
    "mae_h = mae(y_true_h, y_pred_h)\n",
    "mse_h = mse(y_true_h, y_pred_h)\n",
    "print(\"Human transcriptions:\")\n",
    "print(f\"MAE: {mae_h} rankings\")\n",
    "print(f\"MSE: {mse_h}\")\n",
    "\n",
    "y_true_api = ranked_stories_api[\"ranking\"]\n",
    "y_pred_api = ranked_stories_api[\"squad_score_rank\"]\n",
    "mae_api = mae(y_true_api, y_pred_api)\n",
    "mse_api = mse(y_true_api, y_pred_api)\n",
    "print(\"\\nAPI transcriptions:\")\n",
    "print(f\"MAE: {mae_api} rankings\")\n",
    "print(f\"MSE: {mse_api}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>grade_level</th>\n",
       "      <th>story_length</th>\n",
       "      <th>avg_word_len</th>\n",
       "      <th>quotes_num</th>\n",
       "      <th>unique_words_num</th>\n",
       "      <th>adj_num</th>\n",
       "      <th>percent_complex_words</th>\n",
       "      <th>ranking</th>\n",
       "      <th>squad_score</th>\n",
       "      <th>squad_score_rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>grade_level</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.133356</td>\n",
       "      <td>0.106430</td>\n",
       "      <td>0.099504</td>\n",
       "      <td>0.121204</td>\n",
       "      <td>-0.019996</td>\n",
       "      <td>0.102992</td>\n",
       "      <td>-0.097073</td>\n",
       "      <td>0.126602</td>\n",
       "      <td>-0.249615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>story_length</th>\n",
       "      <td>0.133356</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.170846</td>\n",
       "      <td>0.288018</td>\n",
       "      <td>0.985773</td>\n",
       "      <td>0.699843</td>\n",
       "      <td>0.161942</td>\n",
       "      <td>-0.547640</td>\n",
       "      <td>0.877277</td>\n",
       "      <td>-0.837752</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>avg_word_len</th>\n",
       "      <td>0.106430</td>\n",
       "      <td>0.170846</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.280723</td>\n",
       "      <td>0.179889</td>\n",
       "      <td>0.147636</td>\n",
       "      <td>0.483906</td>\n",
       "      <td>-0.318545</td>\n",
       "      <td>0.496456</td>\n",
       "      <td>-0.469200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>quotes_num</th>\n",
       "      <td>0.099504</td>\n",
       "      <td>0.288018</td>\n",
       "      <td>0.280723</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.301507</td>\n",
       "      <td>0.324317</td>\n",
       "      <td>0.199650</td>\n",
       "      <td>-0.339448</td>\n",
       "      <td>0.579847</td>\n",
       "      <td>-0.506412</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique_words_num</th>\n",
       "      <td>0.121204</td>\n",
       "      <td>0.985773</td>\n",
       "      <td>0.179889</td>\n",
       "      <td>0.301507</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.749681</td>\n",
       "      <td>0.222655</td>\n",
       "      <td>-0.536605</td>\n",
       "      <td>0.894578</td>\n",
       "      <td>-0.852126</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adj_num</th>\n",
       "      <td>-0.019996</td>\n",
       "      <td>0.699843</td>\n",
       "      <td>0.147636</td>\n",
       "      <td>0.324317</td>\n",
       "      <td>0.749681</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.161838</td>\n",
       "      <td>-0.578159</td>\n",
       "      <td>0.789394</td>\n",
       "      <td>-0.657188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>percent_complex_words</th>\n",
       "      <td>0.102992</td>\n",
       "      <td>0.161942</td>\n",
       "      <td>0.483906</td>\n",
       "      <td>0.199650</td>\n",
       "      <td>0.222655</td>\n",
       "      <td>0.161838</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.185505</td>\n",
       "      <td>0.340646</td>\n",
       "      <td>-0.328446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ranking</th>\n",
       "      <td>-0.097073</td>\n",
       "      <td>-0.547640</td>\n",
       "      <td>-0.318545</td>\n",
       "      <td>-0.339448</td>\n",
       "      <td>-0.536605</td>\n",
       "      <td>-0.578159</td>\n",
       "      <td>-0.185505</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.634993</td>\n",
       "      <td>0.618462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>squad_score</th>\n",
       "      <td>0.126602</td>\n",
       "      <td>0.877277</td>\n",
       "      <td>0.496456</td>\n",
       "      <td>0.579847</td>\n",
       "      <td>0.894578</td>\n",
       "      <td>0.789394</td>\n",
       "      <td>0.340646</td>\n",
       "      <td>-0.634993</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.918517</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>squad_score_rank</th>\n",
       "      <td>-0.249615</td>\n",
       "      <td>-0.837752</td>\n",
       "      <td>-0.469200</td>\n",
       "      <td>-0.506412</td>\n",
       "      <td>-0.852126</td>\n",
       "      <td>-0.657188</td>\n",
       "      <td>-0.328446</td>\n",
       "      <td>0.618462</td>\n",
       "      <td>-0.918517</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       grade_level  story_length  avg_word_len  quotes_num  \\\n",
       "grade_level               1.000000      0.133356      0.106430    0.099504   \n",
       "story_length              0.133356      1.000000      0.170846    0.288018   \n",
       "avg_word_len              0.106430      0.170846      1.000000    0.280723   \n",
       "quotes_num                0.099504      0.288018      0.280723    1.000000   \n",
       "unique_words_num          0.121204      0.985773      0.179889    0.301507   \n",
       "adj_num                  -0.019996      0.699843      0.147636    0.324317   \n",
       "percent_complex_words     0.102992      0.161942      0.483906    0.199650   \n",
       "ranking                  -0.097073     -0.547640     -0.318545   -0.339448   \n",
       "squad_score               0.126602      0.877277      0.496456    0.579847   \n",
       "squad_score_rank         -0.249615     -0.837752     -0.469200   -0.506412   \n",
       "\n",
       "                       unique_words_num   adj_num  percent_complex_words  \\\n",
       "grade_level                    0.121204 -0.019996               0.102992   \n",
       "story_length                   0.985773  0.699843               0.161942   \n",
       "avg_word_len                   0.179889  0.147636               0.483906   \n",
       "quotes_num                     0.301507  0.324317               0.199650   \n",
       "unique_words_num               1.000000  0.749681               0.222655   \n",
       "adj_num                        0.749681  1.000000               0.161838   \n",
       "percent_complex_words          0.222655  0.161838               1.000000   \n",
       "ranking                       -0.536605 -0.578159              -0.185505   \n",
       "squad_score                    0.894578  0.789394               0.340646   \n",
       "squad_score_rank              -0.852126 -0.657188              -0.328446   \n",
       "\n",
       "                        ranking  squad_score  squad_score_rank  \n",
       "grade_level           -0.097073     0.126602         -0.249615  \n",
       "story_length          -0.547640     0.877277         -0.837752  \n",
       "avg_word_len          -0.318545     0.496456         -0.469200  \n",
       "quotes_num            -0.339448     0.579847         -0.506412  \n",
       "unique_words_num      -0.536605     0.894578         -0.852126  \n",
       "adj_num               -0.578159     0.789394         -0.657188  \n",
       "percent_complex_words -0.185505     0.340646         -0.328446  \n",
       "ranking                1.000000    -0.634993          0.618462  \n",
       "squad_score           -0.634993     1.000000         -0.918517  \n",
       "squad_score_rank       0.618462    -0.918517          1.000000  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Look at correlations\n",
    "# Note that the correlation between Squad Score and stakeholder ranking is -.60\n",
    "ranked_stories_api.iloc[:,2:].corr()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate complexity ratings from all textstat complexity metrics \n",
    "# For use in comparing to Squad Score correlations\n",
    "\n",
    "def get_textstat_metrics(transcription):\n",
    "    \"\"\"\n",
    "    Return 8 complexity metrics from texstat package for a transcription.\n",
    "    \n",
    "    Metrics:\n",
    "    Flesch-Kincaid Grade Level, Fog Scale, SMOG Index, \n",
    "    Automated Readability Index, Coleman-Liau Index, Linsear Write Formula,\n",
    "    Dale-Chall Readability Score, and a compiled \"Readabilty Consensus\"\n",
    "    \"\"\"\n",
    "    \n",
    "    row = []\n",
    "    \n",
    "    metric_functions = [\"smog_index\", \"flesch_kincaid_grade\", \"coleman_liau_index\", \n",
    "                   \"automated_readability_index\", \"dale_chall_readability_score\",\n",
    "                   \"linsear_write_formula\", \"gunning_fog\", \"text_standard\"]\n",
    "    \n",
    "    # Add all textstat functions to row \n",
    "    for metric in metric_functions:\n",
    "        row.append(getattr(textstat, metric)(transcription))\n",
    "    \n",
    "    return row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate new dataframe\n",
    "metric_functions = [\"smog_index\", \"flesch_kincaid_grade\", \"coleman_liau_index\", \n",
    "                    \"automated_readability_index\", \"dale_chall_readability_score\",\n",
    "                    \"linsear_write_formula\", \"gunning_fog\", \"text_standard\"]\n",
    "\n",
    "textstat_output = ranked_stories_api[\"transcription\"].apply(get_textstat_metrics)\n",
    "\n",
    "textstat_df = pd.DataFrame.from_records(textstat_output, \n",
    "                          index=ranked_stories_api[\"story_id\"],\n",
    "                          columns=metric_functions)\n",
    "\n",
    "# Add rankings to df\n",
    "textstat_df[\"ranking\"] = ranked_stories_api[\"ranking\"].tolist()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>smog_index</th>\n",
       "      <th>flesch_kincaid_grade</th>\n",
       "      <th>coleman_liau_index</th>\n",
       "      <th>automated_readability_index</th>\n",
       "      <th>dale_chall_readability_score</th>\n",
       "      <th>linsear_write_formula</th>\n",
       "      <th>gunning_fog</th>\n",
       "      <th>ranking</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>smog_index</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.611097</td>\n",
       "      <td>0.076767</td>\n",
       "      <td>-0.615090</td>\n",
       "      <td>-0.420572</td>\n",
       "      <td>-0.565149</td>\n",
       "      <td>-0.605051</td>\n",
       "      <td>-0.323771</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>flesch_kincaid_grade</th>\n",
       "      <td>-0.611097</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.213118</td>\n",
       "      <td>0.999333</td>\n",
       "      <td>0.659239</td>\n",
       "      <td>0.913174</td>\n",
       "      <td>0.998816</td>\n",
       "      <td>0.063661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>coleman_liau_index</th>\n",
       "      <td>0.076767</td>\n",
       "      <td>0.213118</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.231851</td>\n",
       "      <td>0.343967</td>\n",
       "      <td>0.235261</td>\n",
       "      <td>0.183021</td>\n",
       "      <td>-0.195179</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>automated_readability_index</th>\n",
       "      <td>-0.615090</td>\n",
       "      <td>0.999333</td>\n",
       "      <td>0.231851</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.660066</td>\n",
       "      <td>0.910650</td>\n",
       "      <td>0.997681</td>\n",
       "      <td>0.062233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dale_chall_readability_score</th>\n",
       "      <td>-0.420572</td>\n",
       "      <td>0.659239</td>\n",
       "      <td>0.343967</td>\n",
       "      <td>0.660066</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.420005</td>\n",
       "      <td>0.645076</td>\n",
       "      <td>-0.007742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>linsear_write_formula</th>\n",
       "      <td>-0.565149</td>\n",
       "      <td>0.913174</td>\n",
       "      <td>0.235261</td>\n",
       "      <td>0.910650</td>\n",
       "      <td>0.420005</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.913920</td>\n",
       "      <td>0.106353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gunning_fog</th>\n",
       "      <td>-0.605051</td>\n",
       "      <td>0.998816</td>\n",
       "      <td>0.183021</td>\n",
       "      <td>0.997681</td>\n",
       "      <td>0.645076</td>\n",
       "      <td>0.913920</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.066445</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ranking</th>\n",
       "      <td>-0.323771</td>\n",
       "      <td>0.063661</td>\n",
       "      <td>-0.195179</td>\n",
       "      <td>0.062233</td>\n",
       "      <td>-0.007742</td>\n",
       "      <td>0.106353</td>\n",
       "      <td>0.066445</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              smog_index  flesch_kincaid_grade  \\\n",
       "smog_index                      1.000000             -0.611097   \n",
       "flesch_kincaid_grade           -0.611097              1.000000   \n",
       "coleman_liau_index              0.076767              0.213118   \n",
       "automated_readability_index    -0.615090              0.999333   \n",
       "dale_chall_readability_score   -0.420572              0.659239   \n",
       "linsear_write_formula          -0.565149              0.913174   \n",
       "gunning_fog                    -0.605051              0.998816   \n",
       "ranking                        -0.323771              0.063661   \n",
       "\n",
       "                              coleman_liau_index  automated_readability_index  \\\n",
       "smog_index                              0.076767                    -0.615090   \n",
       "flesch_kincaid_grade                    0.213118                     0.999333   \n",
       "coleman_liau_index                      1.000000                     0.231851   \n",
       "automated_readability_index             0.231851                     1.000000   \n",
       "dale_chall_readability_score            0.343967                     0.660066   \n",
       "linsear_write_formula                   0.235261                     0.910650   \n",
       "gunning_fog                             0.183021                     0.997681   \n",
       "ranking                                -0.195179                     0.062233   \n",
       "\n",
       "                              dale_chall_readability_score  \\\n",
       "smog_index                                       -0.420572   \n",
       "flesch_kincaid_grade                              0.659239   \n",
       "coleman_liau_index                                0.343967   \n",
       "automated_readability_index                       0.660066   \n",
       "dale_chall_readability_score                      1.000000   \n",
       "linsear_write_formula                             0.420005   \n",
       "gunning_fog                                       0.645076   \n",
       "ranking                                          -0.007742   \n",
       "\n",
       "                              linsear_write_formula  gunning_fog   ranking  \n",
       "smog_index                                -0.565149    -0.605051 -0.323771  \n",
       "flesch_kincaid_grade                       0.913174     0.998816  0.063661  \n",
       "coleman_liau_index                         0.235261     0.183021 -0.195179  \n",
       "automated_readability_index                0.910650     0.997681  0.062233  \n",
       "dale_chall_readability_score               0.420005     0.645076 -0.007742  \n",
       "linsear_write_formula                      1.000000     0.913920  0.106353  \n",
       "gunning_fog                                0.913920     1.000000  0.066445  \n",
       "ranking                                    0.106353     0.066445  1.000000  "
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Look at correlations\n",
    "# Note that best correlation to ranking is smog_index, with only -.32\n",
    "# Because of this, decision was made that Squad Score is a better fit for our use case\n",
    "# than any textstat package\n",
    "textstat_df.corr()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>story_length</th>\n",
       "      <th>avg_word_len</th>\n",
       "      <th>quotes_num</th>\n",
       "      <th>unique_words_num</th>\n",
       "      <th>adj_num</th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3132</td>\n",
       "      <td>1375</td>\n",
       "      <td>5.092593</td>\n",
       "      <td>6</td>\n",
       "      <td>138</td>\n",
       "      <td>15</td>\n",
       "      <td>47.843668</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  story_id  story_length  avg_word_len  quotes_num  unique_words_num  adj_num  \\\n",
       "0     3132          1375      5.092593           6               138       15   \n",
       "\n",
       "   squad_score  \n",
       "0    47.843668  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Generate CSV with Squad Score and metrics used for Squad Score\n",
    "csv_features = scaled_features.copy()\n",
    "csv_features.insert(0, \"story_id\")\n",
    "squad_score_metrics = api_metrics[csv_features].merge(api_squad_score)\n",
    "\n",
    "squad_score_metrics.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>story_id</th>\n",
       "      <th>story_length</th>\n",
       "      <th>avg_word_len</th>\n",
       "      <th>quotes_num</th>\n",
       "      <th>unique_words_num</th>\n",
       "      <th>adj_num</th>\n",
       "      <th>squad_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>3229</td>\n",
       "      <td>296</td>\n",
       "      <td>4.169014</td>\n",
       "      <td>0</td>\n",
       "      <td>32</td>\n",
       "      <td>2</td>\n",
       "      <td>1.668999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>3240</td>\n",
       "      <td>173</td>\n",
       "      <td>5.088235</td>\n",
       "      <td>0</td>\n",
       "      <td>26</td>\n",
       "      <td>3</td>\n",
       "      <td>14.401436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3117</td>\n",
       "      <td>439</td>\n",
       "      <td>4.877778</td>\n",
       "      <td>1</td>\n",
       "      <td>56</td>\n",
       "      <td>3</td>\n",
       "      <td>17.120997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>3202</td>\n",
       "      <td>535</td>\n",
       "      <td>4.652174</td>\n",
       "      <td>3</td>\n",
       "      <td>64</td>\n",
       "      <td>7</td>\n",
       "      <td>19.418121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>3127</td>\n",
       "      <td>548</td>\n",
       "      <td>4.849558</td>\n",
       "      <td>2</td>\n",
       "      <td>56</td>\n",
       "      <td>6</td>\n",
       "      <td>20.389726</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>5219</td>\n",
       "      <td>2148</td>\n",
       "      <td>5.290640</td>\n",
       "      <td>42</td>\n",
       "      <td>222</td>\n",
       "      <td>17</td>\n",
       "      <td>91.975460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>5254</td>\n",
       "      <td>3467</td>\n",
       "      <td>5.317485</td>\n",
       "      <td>4</td>\n",
       "      <td>354</td>\n",
       "      <td>26</td>\n",
       "      <td>95.950853</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>5213</td>\n",
       "      <td>2928</td>\n",
       "      <td>5.030928</td>\n",
       "      <td>34</td>\n",
       "      <td>299</td>\n",
       "      <td>29</td>\n",
       "      <td>104.660013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>5235</td>\n",
       "      <td>3207</td>\n",
       "      <td>4.995327</td>\n",
       "      <td>32</td>\n",
       "      <td>302</td>\n",
       "      <td>32</td>\n",
       "      <td>107.548101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>5234</td>\n",
       "      <td>2989</td>\n",
       "      <td>5.555762</td>\n",
       "      <td>26</td>\n",
       "      <td>295</td>\n",
       "      <td>47</td>\n",
       "      <td>119.110260</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>167 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    story_id  story_length  avg_word_len  quotes_num  unique_words_num  \\\n",
       "34      3229           296      4.169014           0                32   \n",
       "66      3240           173      5.088235           0                26   \n",
       "3       3117           439      4.877778           1                56   \n",
       "51      3202           535      4.652174           3                64   \n",
       "10      3127           548      4.849558           2                56   \n",
       "..       ...           ...           ...         ...               ...   \n",
       "132     5219          2148      5.290640          42               222   \n",
       "78      5254          3467      5.317485           4               354   \n",
       "102     5213          2928      5.030928          34               299   \n",
       "108     5235          3207      4.995327          32               302   \n",
       "117     5234          2989      5.555762          26               295   \n",
       "\n",
       "     adj_num  squad_score  \n",
       "34         2     1.668999  \n",
       "66         3    14.401436  \n",
       "3          3    17.120997  \n",
       "51         7    19.418121  \n",
       "10         6    20.389726  \n",
       "..       ...          ...  \n",
       "132       17    91.975460  \n",
       "78        26    95.950853  \n",
       "102       29   104.660013  \n",
       "108       32   107.548101  \n",
       "117       47   119.110260  \n",
       "\n",
       "[167 rows x 7 columns]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "squad_score_metrics.sort_values(\"squad_score\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "squad_score_metrics.to_csv(\"squad_score_metrics.csv\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "StorySquad (Python 3.8)",
   "language": "python",
   "name": "storysquad"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}