whylabs/whylogs-python

View on GitHub
python/examples/advanced/converting_v0_to_v1.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Converting Profiles from whylogs v0 to v1"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If, for some reason, you have profiles generated from whylogs v0 (Python or Java) and wish to work with them in whylogs v1, we provide converters to help you do so.\n",
    "\n",
    "Once you convert the profiles to v1, you can use them just as you would any other v1 whylogs profile.\n",
    "\n",
    "This short example is divided into two parts:\n",
    "\n",
    "- Download a sample v0 profile and write it to disk\n",
    "- Read the v0 profile and convert it to a v1 Profile View\n",
    "\n",
    "Let's get to it!"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installing whylogs and importing modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: you may need to restart the kernel to use updated packages.\n",
    "%pip install whylogs"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Downloading v0 profile\n",
    "\n",
    "First, we need a sample v0 profile to demonstrate how to convert it.\n",
    "\n",
    "To do so, we'll download a v0 profile from S3 and write it to disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#write a file to disk from an url\n",
    "from urllib.request import urlopen\n",
    "url = \"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/dataset_profile_v0.bin\"\n",
    "profile_name = \"dataset_profile_v0.bin\"\n",
    "\n",
    "# Download from URL\n",
    "with urlopen(url) as file:\n",
    "    content = file.read()\n",
    "\n",
    "# Save to file\n",
    "with open(profile_name, 'wb') as download:\n",
    "    download.write(content)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convert serialized v0 profile to v1 profile view\n",
    "\n",
    "The converter will enable you to read the v0 profile and convert it to a v1 profile view.\n",
    "\n",
    "Considering it's a Profile View, you'll be able to use it for tasks such as visualization, analysis and merging. However, you won't be able to use it to continue logging new data.\n",
    "\n",
    "To do so, we'll use the `read_v0_to_view` utility from `whylogs.migration.converters`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "For this example to work you need to have a recent version of whylogs (tested with 1.1.22), you are currently running: whylogs==1.1.19\n",
      "Reading v0 file from disk: dataset_profile_v0.bin\n",
      "Converted: dataset_profile_v0.bin to a v1 DatasetProfileView\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>cardinality/est</th>\n",
       "      <th>cardinality/lower_1</th>\n",
       "      <th>cardinality/upper_1</th>\n",
       "      <th>counts/inf</th>\n",
       "      <th>counts/n</th>\n",
       "      <th>counts/nan</th>\n",
       "      <th>counts/null</th>\n",
       "      <th>distribution/max</th>\n",
       "      <th>distribution/mean</th>\n",
       "      <th>distribution/median</th>\n",
       "      <th>...</th>\n",
       "      <th>distribution/stddev</th>\n",
       "      <th>frequent_items/frequent_strings</th>\n",
       "      <th>ints/max</th>\n",
       "      <th>ints/min</th>\n",
       "      <th>type</th>\n",
       "      <th>types/boolean</th>\n",
       "      <th>types/fractional</th>\n",
       "      <th>types/integral</th>\n",
       "      <th>types/object</th>\n",
       "      <th>types/string</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>column</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>animal</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.00015</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>[FrequentItem(value='cat', est=2, upper=2, low...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SummaryType.COLUMN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>legs</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.00015</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.5</td>\n",
       "      <td>4.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.914854</td>\n",
       "      <td>[FrequentItem(value='4', est=2, upper=2, lower...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SummaryType.COLUMN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weight</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.00015</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.3</td>\n",
       "      <td>3.4</td>\n",
       "      <td>4.1</td>\n",
       "      <td>...</td>\n",
       "      <td>1.389244</td>\n",
       "      <td>[FrequentItem(value='1.8', est=1, upper=1, low...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SummaryType.COLUMN</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 30 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        cardinality/est  cardinality/lower_1  cardinality/upper_1  counts/inf  \\\n",
       "column                                                                          \n",
       "animal              3.0                  3.0              3.00015           0   \n",
       "legs                3.0                  3.0              3.00015           0   \n",
       "weight              3.0                  3.0              3.00015           0   \n",
       "\n",
       "        counts/n  counts/nan  counts/null  distribution/max  \\\n",
       "column                                                        \n",
       "animal         4           0            0               NaN   \n",
       "legs           4           0            0               4.0   \n",
       "weight         4           0            0               4.3   \n",
       "\n",
       "        distribution/mean  distribution/median  ...  distribution/stddev  \\\n",
       "column                                          ...                        \n",
       "animal                0.0                  NaN  ...             0.000000   \n",
       "legs                  2.5                  4.0  ...             1.914854   \n",
       "weight                3.4                  4.1  ...             1.389244   \n",
       "\n",
       "                          frequent_items/frequent_strings  ints/max  ints/min  \\\n",
       "column                                                                          \n",
       "animal  [FrequentItem(value='cat', est=2, upper=2, low...         0         0   \n",
       "legs    [FrequentItem(value='4', est=2, upper=2, lower...         0         0   \n",
       "weight  [FrequentItem(value='1.8', est=1, upper=1, low...         0         0   \n",
       "\n",
       "                      type  types/boolean  types/fractional  types/integral  \\\n",
       "column                                                                        \n",
       "animal  SummaryType.COLUMN              0                 0               0   \n",
       "legs    SummaryType.COLUMN              0                 0               4   \n",
       "weight  SummaryType.COLUMN              0                 3               0   \n",
       "\n",
       "        types/object  types/string  \n",
       "column                              \n",
       "animal             0             4  \n",
       "legs               0             0  \n",
       "weight             0             0  \n",
       "\n",
       "[3 rows x 30 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from whylogs.migration.converters import (\n",
    "    read_v0_to_view\n",
    ")\n",
    "from whylogs.core import DatasetProfileView\n",
    "\n",
    "import whylogs as why\n",
    "\n",
    "print(f\"For this example to work you need to have a recent version of whylogs (tested with 1.1.22), you are currently running: whylogs=={why.__version__}\")\n",
    "\n",
    "profile_file_path_v0 = \"dataset_profile_v0.bin\"\n",
    "\n",
    "print(f\"Reading v0 file from disk: {profile_file_path_v0}\")\n",
    "view: DatasetProfileView = read_v0_to_view(profile_file_path_v0)\n",
    "print(f\"Converted: {profile_file_path_v0} to a v1 DatasetProfileView\")\n",
    "view.to_pandas()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And there you have it!\n",
    "\n",
    "You can now use the profile for tasks such as:\n",
    "\n",
    "- [Visualization](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Notebook_Profile_Visualizer.ipynb)\n",
    "- [Data Validation](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Constraints_Suite.ipynb)\n",
    "- [Merging](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Merging_Profiles.ipynb)\n",
    "- [Writing to WhyLabs](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_to_WhyLabs.ipynb)\n",
    "- etc.\n",
    "\n",
    "Head to our [examples page](https://github.com/whylabs/whylogs/tree/mainline/python/examples) to see more!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}