whylabs/whylogs-python

View on GitHub
python/examples/integrations/writers/Getting_Started_with_WhyLabsV1.ipynb

Summary

Maintainability
Test Coverage
{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f",
      "metadata": {
        "id": "013a0fd4-31f9-4f3f-bf3a-1efc9640422f"
      },
      "source": [
        "\n",
        "![alt text](https://whylabs-public.s3.us-west-2.amazonaws.com/assets/whylabs-logo-night-blue.svg)\n",
        "\n",
        "*Run AI with Certainty*\n",
        "\n",
        "# **Getting Started with WhyLabs** "
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "dFKGE4P7N06M",
      "metadata": {
        "id": "dFKGE4P7N06M"
      },
      "source": [
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Getting_Started_with_WhyLabsV1.ipynb)\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "7iwPGGTl1Uuz",
      "metadata": {
        "id": "7iwPGGTl1Uuz"
      },
      "source": [
        "### 🚩 **Step 1: Create a WhyLabs account** \n",
        "In order to use this example notebook, you'll first need to head to [WhyLabs](https://www.whylabs.ai/free) and signup for a free account.\n",
        "\n",
        "**You can skip the onboarding code example if you are using this noteboook**\n",
        "\n",
        "As part of the onboarding workflow, you will receive an **organization ID** for your account. This is the identifier for your account.\n",
        "\n",
        "You'll also need to create an access token as part of the onboarding flow.\n",
        "\n",
        "#### 🔑 *If you already have a WhyLabs account* \n",
        "Please go to *Settings* -> *Access Tokens* to generate tokens.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "l8o9dKSU1X6H",
      "metadata": {
        "id": "l8o9dKSU1X6H"
      },
      "source": [
        "### 🛠 **Step 2: Install whylogs and import dependencies** \n",
        "To begin, uncomment the cell below and install the **[whylogs](https://github.com/whylabs/whylogs)** library.\n",
        "\n",
        "[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/whylabs/whylogs-python/blob/mainline/LICENSE)\n",
        "[![PyPI version](https://badge.fury.io/py/whylogs.svg)](https://badge.fury.io/py/whylogs)\n",
        "[![Coverage Status](https://coveralls.io/repos/github/whylabs/whylogs/badge.svg?branch=mainline)](https://coveralls.io/github/whylabs/whylogs?branch=mainline)\n",
        "[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n",
        "[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4490/badge)](https://bestpractices.coreinfrastructure.org/projects/4490)\n",
        "[![PyPi Downloads](https://pepy.tech/badge/whylogs)](https://pepy.tech/project/whylogs)\n",
        "![CI](https://github.com/whylabs/whylogs-python/workflows/whylogs%20CI/badge.svg)\n",
        "[![Maintainability](https://api.codeclimate.com/v1/badges/442f6ca3dca1e583a488/maintainability)](https://codeclimate.com/github/whylabs/whylogs-python/maintainability)\n",
        "\n",
        "✅ The `whylogs` library profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.\n",
        "\n",
        "\n",
        "✅ This library runs locally on your machine and collects relevant metrics in dataset profiles that can both be logged to disk and uploaded to the WhyLabs Platform for monitoring."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ad907ce3-0c3b-49e4-86f1-eae9de934f7b",
        "jupyter": {
          "outputs_hidden": true
        },
        "outputId": "cbf178c0-9028-4002-ae01-568502d30b17",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# Note: you may need to restart the kernel to use updated packages.\n",
        "### The following WhyLabs Platform integration example requires the latest whylogs version: \n",
        "%pip install whylogs"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c",
      "metadata": {
        "id": "a244145c-ea35-4ab6-b03e-cf5f864ed94c"
      },
      "source": [
        "### 📝 **Step 3: Load example data batches**\n",
        "\n",
        "The example data is prepared from our public S3 bucket. Here in the example we have prepared a few examples CSVs for the example."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "b78028ea-c7cb-494f-a303-071f1c345dfc",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "b78028ea-c7cb-494f-a303-071f1c345dfc",
        "outputId": "6acdedee-c4fd-4377-b525-beeaec390a2e"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_1.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_2.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_3.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_4.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_5.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_6.csv\n",
            "Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_7.csv\n"
          ]
        }
      ],
      "source": [
        "import pandas as pd\n",
        "\n",
        "pdfs = []\n",
        "for i in range(1, 8):\n",
        "    path = f\"https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_{i}.csv\"\n",
        "    print(f\"Loading data from {path}\")\n",
        "    df = pd.read_csv(path)\n",
        "    pdfs.append(df)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 394
        },
        "id": "67b81ab4-a456-4d2d-9547-ad0d772e0aaa",
        "outputId": "6c2268d7-43fe-4d1d-a4dc-738db5c747cd"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>id</th>\n",
              "      <th>member_id</th>\n",
              "      <th>loan_amnt</th>\n",
              "      <th>funded_amnt</th>\n",
              "      <th>funded_amnt_inv</th>\n",
              "      <th>int_rate</th>\n",
              "      <th>installment</th>\n",
              "      <th>annual_inc</th>\n",
              "      <th>desc</th>\n",
              "      <th>...</th>\n",
              "      <th>hardship_loan_status</th>\n",
              "      <th>orig_projected_additional_accrued_interest</th>\n",
              "      <th>hardship_payoff_balance_amount</th>\n",
              "      <th>hardship_last_payment_amount</th>\n",
              "      <th>debt_settlement_flag_date</th>\n",
              "      <th>settlement_status</th>\n",
              "      <th>settlement_date</th>\n",
              "      <th>settlement_amount</th>\n",
              "      <th>settlement_percentage</th>\n",
              "      <th>settlement_term</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>407.000000</td>\n",
              "      <td>4.070000e+02</td>\n",
              "      <td>0.0</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>407.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>...</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>12548.717445</td>\n",
              "      <td>1.158631e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>14203.746929</td>\n",
              "      <td>14203.746929</td>\n",
              "      <td>14202.948403</td>\n",
              "      <td>13.514054</td>\n",
              "      <td>418.020344</td>\n",
              "      <td>78818.956069</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>125.354772</td>\n",
              "      <td>1.207642e+06</td>\n",
              "      <td>NaN</td>\n",
              "      <td>9351.142374</td>\n",
              "      <td>9351.142374</td>\n",
              "      <td>9350.997874</td>\n",
              "      <td>5.446881</td>\n",
              "      <td>271.096531</td>\n",
              "      <td>55864.939403</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>12325.000000</td>\n",
              "      <td>1.121538e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1000.000000</td>\n",
              "      <td>1000.000000</td>\n",
              "      <td>1000.000000</td>\n",
              "      <td>5.320000</td>\n",
              "      <td>34.220000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>12442.500000</td>\n",
              "      <td>1.150769e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>7000.000000</td>\n",
              "      <td>7000.000000</td>\n",
              "      <td>7000.000000</td>\n",
              "      <td>9.930000</td>\n",
              "      <td>235.580000</td>\n",
              "      <td>43325.000000</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>12550.000000</td>\n",
              "      <td>1.157004e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12000.000000</td>\n",
              "      <td>12000.000000</td>\n",
              "      <td>12000.000000</td>\n",
              "      <td>12.620000</td>\n",
              "      <td>357.250000</td>\n",
              "      <td>63300.000000</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>12653.500000</td>\n",
              "      <td>1.168245e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>20000.000000</td>\n",
              "      <td>20000.000000</td>\n",
              "      <td>20000.000000</td>\n",
              "      <td>16.020000</td>\n",
              "      <td>553.515000</td>\n",
              "      <td>95000.000000</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>12862.000000</td>\n",
              "      <td>1.181592e+08</td>\n",
              "      <td>NaN</td>\n",
              "      <td>40000.000000</td>\n",
              "      <td>40000.000000</td>\n",
              "      <td>40000.000000</td>\n",
              "      <td>30.990000</td>\n",
              "      <td>1417.710000</td>\n",
              "      <td>495000.000000</td>\n",
              "      <td>NaN</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>8 rows × 126 columns</p>\n",
              "</div>"
            ],
            "text/plain": [
              "         Unnamed: 0            id  member_id     loan_amnt   funded_amnt  \\\n",
              "count    407.000000  4.070000e+02        0.0    407.000000    407.000000   \n",
              "mean   12548.717445  1.158631e+08        NaN  14203.746929  14203.746929   \n",
              "std      125.354772  1.207642e+06        NaN   9351.142374   9351.142374   \n",
              "min    12325.000000  1.121538e+08        NaN   1000.000000   1000.000000   \n",
              "25%    12442.500000  1.150769e+08        NaN   7000.000000   7000.000000   \n",
              "50%    12550.000000  1.157004e+08        NaN  12000.000000  12000.000000   \n",
              "75%    12653.500000  1.168245e+08        NaN  20000.000000  20000.000000   \n",
              "max    12862.000000  1.181592e+08        NaN  40000.000000  40000.000000   \n",
              "\n",
              "       funded_amnt_inv    int_rate  installment     annual_inc  desc  ...  \\\n",
              "count       407.000000  407.000000   407.000000     407.000000   0.0  ...   \n",
              "mean      14202.948403   13.514054   418.020344   78818.956069   NaN  ...   \n",
              "std        9350.997874    5.446881   271.096531   55864.939403   NaN  ...   \n",
              "min        1000.000000    5.320000    34.220000       0.000000   NaN  ...   \n",
              "25%        7000.000000    9.930000   235.580000   43325.000000   NaN  ...   \n",
              "50%       12000.000000   12.620000   357.250000   63300.000000   NaN  ...   \n",
              "75%       20000.000000   16.020000   553.515000   95000.000000   NaN  ...   \n",
              "max       40000.000000   30.990000  1417.710000  495000.000000   NaN  ...   \n",
              "\n",
              "       hardship_loan_status  orig_projected_additional_accrued_interest  \\\n",
              "count                   0.0                                         0.0   \n",
              "mean                    NaN                                         NaN   \n",
              "std                     NaN                                         NaN   \n",
              "min                     NaN                                         NaN   \n",
              "25%                     NaN                                         NaN   \n",
              "50%                     NaN                                         NaN   \n",
              "75%                     NaN                                         NaN   \n",
              "max                     NaN                                         NaN   \n",
              "\n",
              "       hardship_payoff_balance_amount  hardship_last_payment_amount  \\\n",
              "count                             0.0                           0.0   \n",
              "mean                              NaN                           NaN   \n",
              "std                               NaN                           NaN   \n",
              "min                               NaN                           NaN   \n",
              "25%                               NaN                           NaN   \n",
              "50%                               NaN                           NaN   \n",
              "75%                               NaN                           NaN   \n",
              "max                               NaN                           NaN   \n",
              "\n",
              "       debt_settlement_flag_date  settlement_status  settlement_date  \\\n",
              "count                        0.0                0.0              0.0   \n",
              "mean                         NaN                NaN              NaN   \n",
              "std                          NaN                NaN              NaN   \n",
              "min                          NaN                NaN              NaN   \n",
              "25%                          NaN                NaN              NaN   \n",
              "50%                          NaN                NaN              NaN   \n",
              "75%                          NaN                NaN              NaN   \n",
              "max                          NaN                NaN              NaN   \n",
              "\n",
              "       settlement_amount  settlement_percentage  settlement_term  \n",
              "count                0.0                    0.0              0.0  \n",
              "mean                 NaN                    NaN              NaN  \n",
              "std                  NaN                    NaN              NaN  \n",
              "min                  NaN                    NaN              NaN  \n",
              "25%                  NaN                    NaN              NaN  \n",
              "50%                  NaN                    NaN              NaN  \n",
              "75%                  NaN                    NaN              NaN  \n",
              "max                  NaN                    NaN              NaN  \n",
              "\n",
              "[8 rows x 126 columns]"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pdfs[0].describe()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "834d6471-8490-48ea-bb52-6be31662dc97",
      "metadata": {
        "id": "834d6471-8490-48ea-bb52-6be31662dc97"
      },
      "source": [
        "### ⚙️ **Step 4: Configure whylogs** \n",
        "\n",
        "`whylogs`, by default, does not send statistics to WhyLabs.\n",
        "\n",
        "There are a few small steps you need to set up. If you haven't got the access key, please onboard with WhyLabs and generate an API key https://hub.whylabsapp.com/settings/access-tokens.\n",
        "\n",
        "**WhyLabs only requires whylogs profiles - your raw data never leaves your machine.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "31371bc6-4ec8-4518-84a0-4718b19d1506",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "31371bc6-4ec8-4518-84a0-4718b19d1506",
        "outputId": "ddb7451b-5354-4318-e2eb-294ab4d4a7e2"
      },
      "outputs": [],
      "source": [
        "import whylogs as why\n",
        "\n",
        "# Create a model in the dashboard and use that model id as the default dataset id in the prompt here. It will be\n",
        "# saved in your whylogs conifg for future use. You can optionally supply reinit=True to reset your conifg. \n",
        "why.init()"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "49d0e0af",
      "metadata": {},
      "source": [
        "You can run this init from the command line as well with.\n",
        "\n",
        "```bash\n",
        "python -m whylogs.api.whylabs.session.why_init\n",
        "```\n",
        "\n",
        "You can use this to reset your config if you want to change your api key or default dataset it."
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "8d7bbb9c-9b18-4e12-bab2-88a5176eeba7",
      "metadata": {
        "id": "8d7bbb9c-9b18-4e12-bab2-88a5176eeba7"
      },
      "source": [
        "### 📬 **Step 5: Logging to WhyLabs** \n",
        "\n",
        "Ensure you have a **model ID** (also called **dataset ID**) before you start!\n",
        "\n",
        "#### Dataset Timestamp\n",
        "* To avoid confusion, it's recommended that you use **[aware datetime](https://docs.python.org/3/library/datetime.html#:~:text=For%20applications%20requiring,is%20in%20effect.)** with `whylogs`\n",
        "* If you don't set `dataset_timestamp` parameter, it'll default to `UTC` now\n",
        "* WhyLabs supports real time visualization when the timestamp is **within the last 7 days**. Anything older than than will be picked up when we run our batch processing\n",
        "* **If you log two profiles for the same day with different timestamps (12:00AM vs 12:01AM), they are merged to the same batch**\n",
        "\n",
        "#### Logging Different Batches of Data\n",
        "* We'll give the profiles different **dates**\n",
        "* Create a new logger for each date. Note that the logger needs to be closed to flush out the data (automatically with the context manager in the example"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "3a006a2b-8403-477f-a1f5-a37ea33b160c",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3a006a2b-8403-477f-a1f5-a37ea33b160c",
        "outputId": "7cbbb771-2936-4fbb-b927-d23a485e29fd"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "✅ Aggregated 407 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1694131200000\n",
            "\n",
            "✅ Aggregated 390 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1694044800000\n",
            "\n",
            "✅ Aggregated 382 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1693958400000\n",
            "\n",
            "✅ Aggregated 371 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1693872000000\n",
            "\n",
            "✅ Aggregated 301 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1693785600000\n",
            "\n",
            "✅ Aggregated 392 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1693699200000\n",
            "\n",
            "✅ Aggregated 283 rows into profile \n",
            "\n",
            "Visualize and explore this profile with one-click\n",
            "🔍 https://hub.whylabsapp.com/resources/model-62/profiles?profile=1693612800000\n"
          ]
        }
      ],
      "source": [
        "from whylogs.api.writer.whylabs import WhyLabsWriter\n",
        "import whylogs as why\n",
        "import datetime\n",
        "\n",
        "for i, df in enumerate(pdfs):\n",
        "    # walking backwards. Each dataset has to map to a date to show up as a different batch\n",
        "    # in WhyLabs\n",
        "    dt = datetime.datetime.now(tz=datetime.timezone.utc) - datetime.timedelta(days=i)\n",
        "\n",
        "    # log each day's data and set the date on the profile\n",
        "    profile = why.log(df, dataset_timestamp=dt).profile()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ScXRPmxJVjju",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "id": "ScXRPmxJVjju",
        "outputId": "5d9e7f45-5c0b-4bd6-d369-3f7bb94c0f65"
      },
      "outputs": [],
      "source": [
        "from whylogs.api.whylabs.session.session_manager import get_current_session\n",
        "from IPython.core.display import HTML\n",
        "\n",
        "session = get_current_session()\n",
        "model_id = session.config.get_default_dataset_id()\n",
        "\n",
        "HTML(f'To view your statistics, go to the <a href=\"https://hub.whylabsapp.com/models/{model_id}/summary\" target=\"_blank\">model dashboard</a>')"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "b2c81d63-f420-4a36-8960-ef093b2f895f",
      "metadata": {
        "id": "b2c81d63-f420-4a36-8960-ef093b2f895f"
      },
      "source": [
        "### 📈 **Step 6: Inspect statistics in WhyLabs** \n",
        "\n",
        "WhyLabs stores the follow statistics, from what is configured in `whylogs`\n",
        "\n",
        "* Simple counters: boolean, null values, data types.\n",
        "* Summary statistics: sum, min, max, median, variance.\n",
        "* Unique value counter or cardinality: tracks an approximate unique value of your feature using HyperLogLog algorithm.\n",
        "* Histograms for numerical features. whyLogs binary output can be queried to with dynamic binning based on the shape of your data.\n",
        "* Top frequent items (default is 128). Note that this configuration affects the memory footprint, especially for text features.\n",
        "\n",
        "Notice that these statistics are organized in batches. So if you run the above cells again, you'll see the statistics changed. \n",
        "\n",
        "* Now check the application to see if your **statistics** \n",
        "* Also, run the above cell again for the same model ID, do you see the statistics changes in WhyLabs? Especially the counters?"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "HO8KS8ep1Npe",
      "metadata": {
        "id": "HO8KS8ep1Npe"
      },
      "source": [
        "### 📝 **Step 7: Run WhyLabs with your data** \n",
        "\n",
        "To go further, visit our [documentation](https://docs.whylabs.ai/) for more detailed of everything that you can do to start monitoring your ML and data pipelines.\n",
        "\n",
        "You can also join our [Community Slack Channel](http://join.slack.whylabs.ai/) for questions related to `whylogs` or [cut us a ticket](https://support.whylabs.ai/) if you encounter issues with Whylabs onboarding.\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3.8.10 ('.venv': poetry)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.16"
    },
    "vscode": {
      "interpreter": {
        "hash": "d39f874c9b8a97550ecbd783714b95e79c9b905449b34f44c40e3bf053b54b41"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}