whylabs/whylogs-python

View on GitHub
python/examples/basic/Schema_Configuration.ipynb

Summary

Maintainability
Test Coverage
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_klgn5JO0oqh"
      },
      "source": [
        ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br>\n",
        ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Schema_Configuration)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Schema_Configuration) to leverage the power of whylogs and WhyLabs together!*"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UjrYEE9H0oqj"
      },
      "source": [
        "# Schema Configuration for Tracking Metrics"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Z6wkDgOL0oqk"
      },
      "source": [
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Schema_Configuration.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LPUPlTjC0oqk"
      },
      "source": [
        "When logging data, whylogs outputs certain metrics according to the column type. While whylogs provide a default behaviour, you can configure it in order to only track metrics that are important to you.\n",
        "\n",
        "In this example, we'll see how you can configure the Schema for a dataset level to control which metrics you want to calculate.\n",
        "We'll see how to specify metrics:\n",
        "\n",
        "1. Per data type\n",
        "\n",
        "2. Per column name\n",
        "\n",
        "\n",
        "But first, let's talk briefly about whylogs' data types and basic metrics."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_FnYGJyu0oql"
      },
      "source": [
        "## Installing whylogs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "4nmNldIc0oql",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d519ace0-fa87-402a-8c04-adfea326f868"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Installing collected packages: whylogs-sketching, types-urllib3, types-requests, whylabs-client, whylogs\n",
            "Successfully installed types-requests-2.31.0.2 types-urllib3-1.26.25.14 whylabs-client-0.5.4 whylogs-1.3.0 whylogs-sketching-3.4.1.dev3\n"
          ]
        }
      ],
      "source": [
        "# Note: you may need to restart the kernel to use updated packages.\n",
        "%pip install whylogs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CknnJlhl0oqm"
      },
      "source": [
        "## whylogs DataTypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iScoxXG40oqm"
      },
      "source": [
        "whylogs maps different data types, like numpy arrays, list, integers, etc. to specific whylogs data types. The three most important whylogs data types are:\n",
        "\n",
        "- Integral\n",
        "- Fractional\n",
        "- String"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dal4ykoM0oqn"
      },
      "source": [
        "Anything that doesn't end up matching the above types will have an `AnyType` type.\n",
        "\n",
        "To check which type a certain Python type is mapped to in whylogs, you can use the StandardTypeMapper:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bU1Ehh6O0oqo",
        "outputId": "ef4d4245-0870-4785-8210-b9d696179dfc"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<whylogs.core.datatypes.AnyType at 0x7dde641a70d0>"
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ],
      "source": [
        "from whylogs.core.datatypes import StandardTypeMapper\n",
        "\n",
        "type_mapper = StandardTypeMapper()\n",
        "\n",
        "type_mapper(list)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0G5JtXSw0oqp"
      },
      "source": [
        "## Basic Metrics"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ULgE1qA_0oqp"
      },
      "source": [
        "The standard metrics available in whylogs are grouped in __namespaces__. They are:\n",
        "\n",
        "- __counts__: Counters, such as number of samples and null values\n",
        "- __types__: Inferred types, such as boolean, string or fractional\n",
        "- __ints__: Max and Min Values\n",
        "- __distribution__: min,max, median, quantile values\n",
        "- __cardinality__: Number of different values\n",
        "- __frequent_items__: Most common values\n",
        "- __unicode_range__: Count of characters used in string values\n",
        "- __condition_count__: Count how often values meet specified conditions"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RTWtbmj60oqp"
      },
      "source": [
        "## Configuring Metrics in the Dataset Schema"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WI1kMCg_0oqq"
      },
      "source": [
        "Now, let's see how we can control which metrics are tracked according to the column's type or column name."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YL9oBPgX0oqq"
      },
      "source": [
        "### Metrics per Type"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-dho9xYo0oqq"
      },
      "source": [
        "Let's assume you're not interested in every metric listed above, and you have a performance-critical application, so you'd like to do as few calculations as possible.\n",
        "\n",
        "For example, you might only be interested in:\n",
        "\n",
        "- Counts/Types metrics for every data type\n",
        "- Distribution metrics for Fractional\n",
        "- Frequent Items for Integral\n",
        "\n",
        "Let's see how we can configure our Schema to track only the above metrics for the related types."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6zQWkC0m0oqr"
      },
      "source": [
        "Let's create a sample dataframe to illustrate:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mlaGnadP0oqr"
      },
      "outputs": [],
      "source": [
        "# Install pandas if you don't have it already\n",
        "%pip install pandas\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "l_OiNyWx0oqr"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "d = {\"col1\": [1, 2, 3], \"col2\": [3.0, 4.0, 5.0], \"col3\": [\"a\", \"b\", \"c\"], \"col4\": [3.0, 4.0, 5.0]}\n",
        "df = pd.DataFrame(data=d)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CfmFqAaB0oqs"
      },
      "source": [
        "whylogs uses `Resolvers` in order to define how a column name or data type gets mapped to different metrics.\n",
        "\n",
        "We will create a custom Resolver class in order to customize it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "wRHf46IA0oqs"
      },
      "outputs": [],
      "source": [
        "from whylogs.core.resolvers import Resolver\n",
        "from whylogs.core.datatypes import DataType, Fractional, Integral\n",
        "from typing import Dict, List\n",
        "from whylogs.core.metrics import StandardMetric\n",
        "from whylogs.core.metrics.metrics import Metric\n",
        "\n",
        "class MyCustomResolver(Resolver):\n",
        "    \"\"\"Resolver that keeps distribution metrics for Fractional and frequent items for Integral, and counters and types metrics for all data types.\"\"\"\n",
        "\n",
        "    def resolve(self, name: str, why_type: DataType, column_schema) -> Dict[str, Metric]:\n",
        "        metrics: List[StandardMetric] = [StandardMetric.counts, StandardMetric.types]\n",
        "        if isinstance(why_type, Fractional):\n",
        "            metrics.append(StandardMetric.distribution)\n",
        "        if isinstance(why_type, Integral):\n",
        "            metrics.append(StandardMetric.frequent_items)\n",
        "\n",
        "\n",
        "        result: Dict[str, Metric] = {}\n",
        "        for m in metrics:\n",
        "            result[m.name] = m.zero(column_schema.cfg)\n",
        "        return result\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pfpijk4F0oqs"
      },
      "source": [
        "In the case above, the `name` parameter is not being used, as the column name is not relevant to map the metrics, only the `why_type`.\n",
        "\n",
        "We basically initialize `metrics` with metrics of both `counts` and `types` namespaces regardless of the data type. Then, we check for the whylogs data type in order to add the desired metric namespace (`distribution` for __Fractional__ columns and `frequent_items` for __Integral__ columns)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O6WvUsIx0oqt"
      },
      "source": [
        "Now we can proceed with the normal process of logging a dataframe. Resolvers are passed to whylogs through a `Dataset Schema`, so we can pass a `DatasetSchema` object to log's `schema` parameter as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 258
        },
        "id": "X7ohGb5E0oqt",
        "outputId": "41a798e0-f82a-42df-d33e-9ab5e63057c5"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "WARNING:whylogs.api.whylabs.session.session_manager:No session found. Call whylogs.init() to initialize a session and authenticate. See https://docs.whylabs.ai/docs/whylabs-whylogs-init for more information.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        counts/inf  counts/n  counts/nan  counts/null  \\\n",
              "column                                                  \n",
              "col1             0         3           0            0   \n",
              "col2             0         3           0            0   \n",
              "col3             0         3           0            0   \n",
              "col4             0         3           0            0   \n",
              "\n",
              "                          frequent_items/frequent_strings                type  \\\n",
              "column                                                                          \n",
              "col1    [FrequentItem(value='1', est=1, upper=1, lower...  SummaryType.COLUMN   \n",
              "col2                                                  NaN  SummaryType.COLUMN   \n",
              "col3                                                  NaN  SummaryType.COLUMN   \n",
              "col4                                                  NaN  SummaryType.COLUMN   \n",
              "\n",
              "        types/boolean  types/fractional  types/integral  types/object  \\\n",
              "column                                                                  \n",
              "col1                0                 0               3             0   \n",
              "col2                0                 3               0             0   \n",
              "col3                0                 0               0             0   \n",
              "col4                0                 3               0             0   \n",
              "\n",
              "        types/string  types/tensor  distribution/max  distribution/mean  \\\n",
              "column                                                                    \n",
              "col1               0             0               NaN                NaN   \n",
              "col2               0             0               5.0                4.0   \n",
              "col3               3             0               NaN                NaN   \n",
              "col4               0             0               5.0                4.0   \n",
              "\n",
              "        distribution/median  distribution/min  distribution/n  \\\n",
              "column                                                          \n",
              "col1                    NaN               NaN             NaN   \n",
              "col2                    4.0               3.0             3.0   \n",
              "col3                    NaN               NaN             NaN   \n",
              "col4                    4.0               3.0             3.0   \n",
              "\n",
              "        distribution/q_01  distribution/q_05  distribution/q_10  \\\n",
              "column                                                            \n",
              "col1                  NaN                NaN                NaN   \n",
              "col2                  3.0                3.0                3.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  3.0                3.0                3.0   \n",
              "\n",
              "        distribution/q_25  distribution/q_75  distribution/q_90  \\\n",
              "column                                                            \n",
              "col1                  NaN                NaN                NaN   \n",
              "col2                  3.0                5.0                5.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  3.0                5.0                5.0   \n",
              "\n",
              "        distribution/q_95  distribution/q_99  distribution/stddev  \n",
              "column                                                             \n",
              "col1                  NaN                NaN                  NaN  \n",
              "col2                  5.0                5.0                  1.0  \n",
              "col3                  NaN                NaN                  NaN  \n",
              "col4                  5.0                5.0                  1.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-562fa275-103a-4db3-932b-e4dad511c90f\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>counts/inf</th>\n",
              "      <th>counts/n</th>\n",
              "      <th>counts/nan</th>\n",
              "      <th>counts/null</th>\n",
              "      <th>frequent_items/frequent_strings</th>\n",
              "      <th>type</th>\n",
              "      <th>types/boolean</th>\n",
              "      <th>types/fractional</th>\n",
              "      <th>types/integral</th>\n",
              "      <th>types/object</th>\n",
              "      <th>types/string</th>\n",
              "      <th>types/tensor</th>\n",
              "      <th>distribution/max</th>\n",
              "      <th>distribution/mean</th>\n",
              "      <th>distribution/median</th>\n",
              "      <th>distribution/min</th>\n",
              "      <th>distribution/n</th>\n",
              "      <th>distribution/q_01</th>\n",
              "      <th>distribution/q_05</th>\n",
              "      <th>distribution/q_10</th>\n",
              "      <th>distribution/q_25</th>\n",
              "      <th>distribution/q_75</th>\n",
              "      <th>distribution/q_90</th>\n",
              "      <th>distribution/q_95</th>\n",
              "      <th>distribution/q_99</th>\n",
              "      <th>distribution/stddev</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>column</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>col1</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>[FrequentItem(value='1', est=1, upper=1, lower...</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col2</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col3</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col4</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-562fa275-103a-4db3-932b-e4dad511c90f')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-562fa275-103a-4db3-932b-e4dad511c90f button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-562fa275-103a-4db3-932b-e4dad511c90f');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-0df80d67-7374-4cd8-8c8c-144cdce1c87c\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-0df80d67-7374-4cd8-8c8c-144cdce1c87c')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const charts = await google.colab.kernel.invokeFunction(\n",
              "          'suggestCharts', [key], {});\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-0df80d67-7374-4cd8-8c8c-144cdce1c87c button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ],
      "source": [
        "import whylogs as why\n",
        "from whylogs.core import DatasetSchema\n",
        "result = why.log(df, schema=DatasetSchema(resolvers=MyCustomResolver()))\n",
        "prof = result.profile()\n",
        "prof_view = prof.view()\n",
        "pd.set_option(\"display.max_columns\", None)\n",
        "prof_view.to_pandas()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JIJu86Am0oqt"
      },
      "source": [
        "Notice we have `counts` and `types` metrics for every type, `distribution` metrics only for `col2` and `col4` (floats) and `frequent_items` only for `col1` (ints).\n",
        "\n",
        "That's precisely what we wanted."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zvUgPw0M0oqu"
      },
      "source": [
        "### Metrics per Column"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fst55b2Z0oqu"
      },
      "source": [
        "Now, suppose we don't want to specify the tracked metrics per data type, and rather by each specific columns.\n",
        "\n",
        "For example, we might want to track:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "imjPxKFe0oqu"
      },
      "source": [
        "- Count metrics for `col1`\n",
        "- Distribution Metrics for `col2`\n",
        "- Cardinality for `col3`\n",
        "- Distribution Metrics + Cardinality for `col4`\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xyEU8lFR0oqu"
      },
      "source": [
        "The process is similar to the previous case. We only need to change the if clauses to check for the `name` instead of `why_type`, like this:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "hjUhBECv0oqv"
      },
      "outputs": [],
      "source": [
        "from whylogs.core.resolvers import Resolver\n",
        "from whylogs.core.datatypes import DataType, Fractional, Integral\n",
        "from typing import Dict, List\n",
        "from whylogs.core.metrics import StandardMetric\n",
        "from whylogs.core.metrics.metrics import Metric\n",
        "\n",
        "class MyCustomResolver(Resolver):\n",
        "    \"\"\"Resolver that keeps distribution metrics for Fractional and frequent items for Integral, and counters and types metrics for all data types.\"\"\"\n",
        "\n",
        "    def resolve(self, name: str, why_type: DataType, column_schema) -> Dict[str, Metric]:\n",
        "        metrics = []\n",
        "        if name=='col1':\n",
        "            metrics.append(StandardMetric.counts)\n",
        "        if name=='col2':\n",
        "            metrics.append(StandardMetric.distribution)\n",
        "        if name=='col3':\n",
        "            metrics.append(StandardMetric.cardinality)\n",
        "        if name=='col4':\n",
        "            metrics.append(StandardMetric.distribution)\n",
        "            metrics.append(StandardMetric.cardinality)\n",
        "\n",
        "\n",
        "\n",
        "        result: Dict[str, Metric] = {}\n",
        "        for m in metrics:\n",
        "            result[m.name] = m.zero(column_schema.cfg)\n",
        "        return result\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LCyHU22O0oqv"
      },
      "source": [
        "Since there's no common metrics for all columns, we can initialize `metrics` as an empty list, and then append the relevant metrics for each columns.\n",
        "\n",
        "Now, we create a custom schema, just like before:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 255
        },
        "id": "jpTGcNNV0oqv",
        "outputId": "99a422ee-8dfe-4811-d137-91b6a18947a5"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        counts/inf  counts/n  counts/nan  counts/null                type  \\\n",
              "column                                                                      \n",
              "col1           0.0       3.0         0.0          0.0  SummaryType.COLUMN   \n",
              "col2           NaN       NaN         NaN          NaN  SummaryType.COLUMN   \n",
              "col3           NaN       NaN         NaN          NaN  SummaryType.COLUMN   \n",
              "col4           NaN       NaN         NaN          NaN  SummaryType.COLUMN   \n",
              "col5           NaN       NaN         NaN          NaN  SummaryType.COLUMN   \n",
              "\n",
              "        distribution/max  distribution/mean  distribution/median  \\\n",
              "column                                                             \n",
              "col1                 NaN                NaN                  NaN   \n",
              "col2                 5.0                4.0                  4.0   \n",
              "col3                 NaN                NaN                  NaN   \n",
              "col4                 5.0                4.0                  4.0   \n",
              "col5                 NaN                NaN                  NaN   \n",
              "\n",
              "        distribution/min  distribution/n  distribution/q_01  \\\n",
              "column                                                        \n",
              "col1                 NaN             NaN                NaN   \n",
              "col2                 3.0             3.0                3.0   \n",
              "col3                 NaN             NaN                NaN   \n",
              "col4                 3.0             3.0                3.0   \n",
              "col5                 NaN             NaN                NaN   \n",
              "\n",
              "        distribution/q_05  distribution/q_10  distribution/q_25  \\\n",
              "column                                                            \n",
              "col1                  NaN                NaN                NaN   \n",
              "col2                  3.0                3.0                3.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  3.0                3.0                3.0   \n",
              "col5                  NaN                NaN                NaN   \n",
              "\n",
              "        distribution/q_75  distribution/q_90  distribution/q_95  \\\n",
              "column                                                            \n",
              "col1                  NaN                NaN                NaN   \n",
              "col2                  5.0                5.0                5.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  5.0                5.0                5.0   \n",
              "col5                  NaN                NaN                NaN   \n",
              "\n",
              "        distribution/q_99  distribution/stddev  cardinality/est  \\\n",
              "column                                                            \n",
              "col1                  NaN                  NaN              NaN   \n",
              "col2                  5.0                  1.0              NaN   \n",
              "col3                  NaN                  NaN              3.0   \n",
              "col4                  5.0                  1.0              3.0   \n",
              "col5                  NaN                  NaN              NaN   \n",
              "\n",
              "        cardinality/lower_1  cardinality/upper_1  \n",
              "column                                            \n",
              "col1                    NaN                  NaN  \n",
              "col2                    NaN                  NaN  \n",
              "col3                    3.0              3.00015  \n",
              "col4                    3.0              3.00015  \n",
              "col5                    NaN                  NaN  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-3fd40544-0669-4500-a843-e4912ed74454\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>counts/inf</th>\n",
              "      <th>counts/n</th>\n",
              "      <th>counts/nan</th>\n",
              "      <th>counts/null</th>\n",
              "      <th>type</th>\n",
              "      <th>distribution/max</th>\n",
              "      <th>distribution/mean</th>\n",
              "      <th>distribution/median</th>\n",
              "      <th>distribution/min</th>\n",
              "      <th>distribution/n</th>\n",
              "      <th>distribution/q_01</th>\n",
              "      <th>distribution/q_05</th>\n",
              "      <th>distribution/q_10</th>\n",
              "      <th>distribution/q_25</th>\n",
              "      <th>distribution/q_75</th>\n",
              "      <th>distribution/q_90</th>\n",
              "      <th>distribution/q_95</th>\n",
              "      <th>distribution/q_99</th>\n",
              "      <th>distribution/stddev</th>\n",
              "      <th>cardinality/est</th>\n",
              "      <th>cardinality/lower_1</th>\n",
              "      <th>cardinality/upper_1</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>column</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>col1</th>\n",
              "      <td>0.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col2</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col3</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col4</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col5</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3fd40544-0669-4500-a843-e4912ed74454')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-3fd40544-0669-4500-a843-e4912ed74454 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-3fd40544-0669-4500-a843-e4912ed74454');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-0bb6c348-d86d-40e5-9a4c-032ee175f173\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-0bb6c348-d86d-40e5-9a4c-032ee175f173')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const charts = await google.colab.kernel.invokeFunction(\n",
              "          'suggestCharts', [key], {});\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-0bb6c348-d86d-40e5-9a4c-032ee175f173 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 8
        }
      ],
      "source": [
        "import whylogs as why\n",
        "from whylogs.core import DatasetSchema\n",
        "df['col5'] = 0\n",
        "result = why.log(df, schema=DatasetSchema(resolvers=MyCustomResolver()))\n",
        "prof = result.profile()\n",
        "prof_view = prof.view()\n",
        "pd.set_option(\"display.max_columns\", None)\n",
        "prof_view.to_pandas()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HzEKQywx0oqw"
      },
      "source": [
        "Note that existing columns that are not specified in your custom resolver won't have any metrics tracked. In the example above, we added a `col5` column, but since we didn't link any metrics to it, all of the metrics are `NaN`s.\n",
        "\n",
        "## Declarative Schema\n",
        "\n",
        "In the previous section, we created subclasses of `Resolver` and implemented its `resolve()` method using control flow. The `DeclarativeSchema` allows us to customize the metrics present in a column by simply listing the metrics we want by data type or column name without implementing a `Resolver` subclass.\n",
        "\n",
        "### Declarative Schema Specification\n",
        "\n",
        "A `ResolverSpec` specifies a list of metrics to use for columns that match it. We can match columns by name or by type. The column name takes precedence if both are given. Each `ResolverSpec` has a list of `MetricSpec` that specify the `Metric`s (and optionally custom configurations) to apply to matching metrics. For example:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 258
        },
        "id": "V2x8PhFUh1ep",
        "outputId": "65d91834-e5cd-443d-963b-a2be7a48428e"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        condition_count/above 42  condition_count/below 42  \\\n",
              "column                                                       \n",
              "col1                         0.0                       3.0   \n",
              "col2                         NaN                       NaN   \n",
              "col3                         NaN                       NaN   \n",
              "col4                         NaN                       NaN   \n",
              "\n",
              "        condition_count/total  distribution/max  distribution/mean  \\\n",
              "column                                                               \n",
              "col1                      3.0               3.0                2.0   \n",
              "col2                      NaN               NaN                NaN   \n",
              "col3                      3.0               NaN                NaN   \n",
              "col4                      NaN               NaN                NaN   \n",
              "\n",
              "        distribution/median  distribution/min  distribution/n  \\\n",
              "column                                                          \n",
              "col1                    2.0               1.0             3.0   \n",
              "col2                    NaN               NaN             NaN   \n",
              "col3                    NaN               NaN             NaN   \n",
              "col4                    NaN               NaN             NaN   \n",
              "\n",
              "        distribution/q_01  distribution/q_05  distribution/q_10  \\\n",
              "column                                                            \n",
              "col1                  1.0                1.0                1.0   \n",
              "col2                  NaN                NaN                NaN   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  NaN                NaN                NaN   \n",
              "\n",
              "        distribution/q_25  distribution/q_75  distribution/q_90  \\\n",
              "column                                                            \n",
              "col1                  1.0                3.0                3.0   \n",
              "col2                  NaN                NaN                NaN   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  NaN                NaN                NaN   \n",
              "\n",
              "        distribution/q_95  distribution/q_99  distribution/stddev  \\\n",
              "column                                                              \n",
              "col1                  3.0                3.0                  1.0   \n",
              "col2                  NaN                NaN                  NaN   \n",
              "col3                  NaN                NaN                  NaN   \n",
              "col4                  NaN                NaN                  NaN   \n",
              "\n",
              "                      type  condition_count/alpha  condition_count/digit  \\\n",
              "column                                                                     \n",
              "col1    SummaryType.COLUMN                    NaN                    NaN   \n",
              "col2    SummaryType.COLUMN                    NaN                    NaN   \n",
              "col3    SummaryType.COLUMN                    3.0                    0.0   \n",
              "col4    SummaryType.COLUMN                    NaN                    NaN   \n",
              "\n",
              "                          frequent_items/frequent_strings  \n",
              "column                                                     \n",
              "col1                                                  NaN  \n",
              "col2                                                  NaN  \n",
              "col3    [FrequentItem(value='c', est=1, upper=1, lower...  \n",
              "col4                                                  NaN  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-b2aeca45-1fc7-4f5c-b150-df8064e747b3\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>condition_count/above 42</th>\n",
              "      <th>condition_count/below 42</th>\n",
              "      <th>condition_count/total</th>\n",
              "      <th>distribution/max</th>\n",
              "      <th>distribution/mean</th>\n",
              "      <th>distribution/median</th>\n",
              "      <th>distribution/min</th>\n",
              "      <th>distribution/n</th>\n",
              "      <th>distribution/q_01</th>\n",
              "      <th>distribution/q_05</th>\n",
              "      <th>distribution/q_10</th>\n",
              "      <th>distribution/q_25</th>\n",
              "      <th>distribution/q_75</th>\n",
              "      <th>distribution/q_90</th>\n",
              "      <th>distribution/q_95</th>\n",
              "      <th>distribution/q_99</th>\n",
              "      <th>distribution/stddev</th>\n",
              "      <th>type</th>\n",
              "      <th>condition_count/alpha</th>\n",
              "      <th>condition_count/digit</th>\n",
              "      <th>frequent_items/frequent_strings</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>column</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>col1</th>\n",
              "      <td>0.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col2</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col3</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>3.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>3.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>[FrequentItem(value='c', est=1, upper=1, lower...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col4</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b2aeca45-1fc7-4f5c-b150-df8064e747b3')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-b2aeca45-1fc7-4f5c-b150-df8064e747b3 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-b2aeca45-1fc7-4f5c-b150-df8064e747b3');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-05a7b497-d858-4076-ada6-12f4b111cecd\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-05a7b497-d858-4076-ada6-12f4b111cecd')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const charts = await google.colab.kernel.invokeFunction(\n",
              "          'suggestCharts', [key], {});\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-05a7b497-d858-4076-ada6-12f4b111cecd button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ],
      "source": [
        "from whylogs.core.metrics.condition_count_metric import (\n",
        "    Condition,\n",
        "    ConditionCountConfig,\n",
        "    ConditionCountMetric,\n",
        ")\n",
        "from whylogs.core.relations import Predicate\n",
        "from whylogs.core.resolvers import COLUMN_METRICS, MetricSpec, ResolverSpec\n",
        "from whylogs.core.schema import DeclarativeSchema\n",
        "from whylogs.core.datatypes import AnyType, DataType, Fractional, Integral, String\n",
        "\n",
        "X = Predicate()\n",
        "\n",
        "\n",
        "schema = DeclarativeSchema(\n",
        "    [\n",
        "        ResolverSpec(\n",
        "            column_name=\"col1\",\n",
        "            metrics=[\n",
        "                MetricSpec(StandardMetric.distribution.value),\n",
        "                MetricSpec(\n",
        "                    ConditionCountMetric,\n",
        "                    ConditionCountConfig(\n",
        "                        conditions={\n",
        "                            \"below 42\": Condition(lambda x: x < 42),\n",
        "                            \"above 42\": Condition(lambda x: x > 42),\n",
        "                        }\n",
        "                    ),\n",
        "                ),\n",
        "            ],\n",
        "        ),\n",
        "        ResolverSpec(\n",
        "            column_type=String,\n",
        "            metrics=[\n",
        "                MetricSpec(StandardMetric.frequent_items.value),\n",
        "                MetricSpec(\n",
        "                    ConditionCountMetric,\n",
        "                    ConditionCountConfig(\n",
        "                        conditions={\n",
        "                            \"alpha\": Condition(X.matches(\"[a-zA-Z]+\")),\n",
        "                            \"digit\": Condition(X.matches(\"[0-9]+\")),\n",
        "                        }\n",
        "                    ),\n",
        "                ),\n",
        "            ],\n",
        "        ),\n",
        "    ]\n",
        ")\n",
        "\n",
        "d = {\"col1\": [1, 2, 3], \"col2\": [3.0, 4.0, 5.0], \"col3\": [\"a\", \"b\", \"c\"], \"col4\": [3.0, 4.0, 5.0]}\n",
        "df = pd.DataFrame(data=d)\n",
        "result = why.log(df, schema=schema)\n",
        "prof_view = result.profile().view()\n",
        "prof_view.to_pandas()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JiMlsvSIh2m7"
      },
      "source": [
        "We can now pass `schema` to `why.log()` to log data according to the schema. Note that we pass the `Metric` class to the the `MetricSpec` constructor, not an instance. In this example, `col1` will have a `ConditionCountMetric` that tracks how often the column entries are above or below 42. Any string column will track how many entries are alphabetic and how many are numeric.\n",
        "\n",
        "`whylogs.core.resolvers.COLUMN_METRICS` is a list of `MetricSpec`s for the metrics WhyLabs expects in each column. There are also some predefined `ResolverSpec` lists to cover common use cases. For example, `STANDARD_RESOLVER` specifies the same metrics as the `StandardResolver`:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "6A4qR1lFpbjR"
      },
      "outputs": [],
      "source": [
        "STANDARD_RESOLVER = [\n",
        "    ResolverSpec(\n",
        "        column_type=Integral,\n",
        "        metrics=COLUMN_METRICS\n",
        "        + [\n",
        "            MetricSpec(StandardMetric.distribution.value),\n",
        "            MetricSpec(StandardMetric.ints.value),\n",
        "            MetricSpec(StandardMetric.cardinality.value),\n",
        "            MetricSpec(StandardMetric.frequent_items.value),\n",
        "        ],\n",
        "    ),\n",
        "    ResolverSpec(\n",
        "        column_type=Fractional,\n",
        "        metrics=COLUMN_METRICS\n",
        "        + [\n",
        "            MetricSpec(StandardMetric.distribution.value),\n",
        "            MetricSpec(StandardMetric.cardinality.value),\n",
        "        ],\n",
        "    ),\n",
        "    ResolverSpec(\n",
        "        column_type=String,\n",
        "        metrics=COLUMN_METRICS\n",
        "        + [\n",
        "            MetricSpec(StandardMetric.unicode_range.value),\n",
        "            MetricSpec(StandardMetric.distribution.value),\n",
        "            MetricSpec(StandardMetric.cardinality.value),\n",
        "            MetricSpec(StandardMetric.frequent_items.value),\n",
        "        ],\n",
        "    ),\n",
        "    ResolverSpec(column_type=AnyType, metrics=COLUMN_METRICS),\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "87x9SdHlP2DF"
      },
      "source": [
        "There are also declarations for\n",
        "*   `LIMITED_TRACKING_RESOLVER` just tracks the metrics required by WhyLogs, plus the distribution metric for numeric columns.\n",
        "*   `NO_FI_RESOLVER` is the same as `STANDARD_RESOLVER` but omits the frequent item metrics.\n",
        "*   `HISTOGRAM_COUNTING_TRACKING_RESOLVER` tracks only the distribution metric for each column.\n",
        "\n",
        "These provide handy starting places if we just want to add one or two metrics to one of these standard schema using the `add_resolver()` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 292
        },
        "id": "Dlwjc70uQNi-",
        "outputId": "007ffa75-a29f-41f1-aed8-a6e32cab82a3"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "WARNING:whylogs.core.resolvers:Conflicting resolvers for distribution metric in column 'col1' of type int\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        cardinality/est  cardinality/lower_1  cardinality/upper_1  \\\n",
              "column                                                              \n",
              "col1                3.0                  3.0              3.00015   \n",
              "col2                3.0                  3.0              3.00015   \n",
              "col3                3.0                  3.0              3.00015   \n",
              "col4                3.0                  3.0              3.00015   \n",
              "\n",
              "        condition_count/above 42  condition_count/below 42  \\\n",
              "column                                                       \n",
              "col1                         0.0                       3.0   \n",
              "col2                         NaN                       NaN   \n",
              "col3                         NaN                       NaN   \n",
              "col4                         NaN                       NaN   \n",
              "\n",
              "        condition_count/total  counts/inf  counts/n  counts/nan  counts/null  \\\n",
              "column                                                                         \n",
              "col1                      3.0           0         3           0            0   \n",
              "col2                      NaN           0         3           0            0   \n",
              "col3                      NaN           0         3           0            0   \n",
              "col4                      NaN           0         3           0            0   \n",
              "\n",
              "        distribution/max  distribution/mean  distribution/median  \\\n",
              "column                                                             \n",
              "col1                 3.0                2.0                  2.0   \n",
              "col2                 5.0                4.0                  4.0   \n",
              "col3                 NaN                0.0                  NaN   \n",
              "col4                 5.0                4.0                  4.0   \n",
              "\n",
              "        distribution/min  distribution/n  distribution/q_01  \\\n",
              "column                                                        \n",
              "col1                 1.0               3                1.0   \n",
              "col2                 3.0               3                3.0   \n",
              "col3                 NaN               0                NaN   \n",
              "col4                 3.0               3                3.0   \n",
              "\n",
              "        distribution/q_05  distribution/q_10  distribution/q_25  \\\n",
              "column                                                            \n",
              "col1                  1.0                1.0                1.0   \n",
              "col2                  3.0                3.0                3.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  3.0                3.0                3.0   \n",
              "\n",
              "        distribution/q_75  distribution/q_90  distribution/q_95  \\\n",
              "column                                                            \n",
              "col1                  3.0                3.0                3.0   \n",
              "col2                  5.0                5.0                5.0   \n",
              "col3                  NaN                NaN                NaN   \n",
              "col4                  5.0                5.0                5.0   \n",
              "\n",
              "        distribution/q_99  distribution/stddev  \\\n",
              "column                                           \n",
              "col1                  3.0                  1.0   \n",
              "col2                  5.0                  1.0   \n",
              "col3                  NaN                  0.0   \n",
              "col4                  5.0                  1.0   \n",
              "\n",
              "                          frequent_items/frequent_strings  ints/max  ints/min  \\\n",
              "column                                                                          \n",
              "col1    [FrequentItem(value='1', est=1, upper=1, lower...       3.0       1.0   \n",
              "col2                                                  NaN       NaN       NaN   \n",
              "col3    [FrequentItem(value='c', est=1, upper=1, lower...       NaN       NaN   \n",
              "col4                                                  NaN       NaN       NaN   \n",
              "\n",
              "                      type  types/boolean  types/fractional  types/integral  \\\n",
              "column                                                                        \n",
              "col1    SummaryType.COLUMN              0                 0               3   \n",
              "col2    SummaryType.COLUMN              0                 3               0   \n",
              "col3    SummaryType.COLUMN              0                 0               0   \n",
              "col4    SummaryType.COLUMN              0                 3               0   \n",
              "\n",
              "        types/object  types/string  types/tensor  \n",
              "column                                            \n",
              "col1               0             0             0  \n",
              "col2               0             0             0  \n",
              "col3               0             3             0  \n",
              "col4               0             0             0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-85ba5e58-5b4c-497c-852b-d6c28abf1d8d\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>cardinality/est</th>\n",
              "      <th>cardinality/lower_1</th>\n",
              "      <th>cardinality/upper_1</th>\n",
              "      <th>condition_count/above 42</th>\n",
              "      <th>condition_count/below 42</th>\n",
              "      <th>condition_count/total</th>\n",
              "      <th>counts/inf</th>\n",
              "      <th>counts/n</th>\n",
              "      <th>counts/nan</th>\n",
              "      <th>counts/null</th>\n",
              "      <th>distribution/max</th>\n",
              "      <th>distribution/mean</th>\n",
              "      <th>distribution/median</th>\n",
              "      <th>distribution/min</th>\n",
              "      <th>distribution/n</th>\n",
              "      <th>distribution/q_01</th>\n",
              "      <th>distribution/q_05</th>\n",
              "      <th>distribution/q_10</th>\n",
              "      <th>distribution/q_25</th>\n",
              "      <th>distribution/q_75</th>\n",
              "      <th>distribution/q_90</th>\n",
              "      <th>distribution/q_95</th>\n",
              "      <th>distribution/q_99</th>\n",
              "      <th>distribution/stddev</th>\n",
              "      <th>frequent_items/frequent_strings</th>\n",
              "      <th>ints/max</th>\n",
              "      <th>ints/min</th>\n",
              "      <th>type</th>\n",
              "      <th>types/boolean</th>\n",
              "      <th>types/fractional</th>\n",
              "      <th>types/integral</th>\n",
              "      <th>types/object</th>\n",
              "      <th>types/string</th>\n",
              "      <th>types/tensor</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>column</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>col1</th>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "      <td>0.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>3</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>[FrequentItem(value='1', est=1, upper=1, lower...</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col2</th>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col3</th>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0.0</td>\n",
              "      <td>[FrequentItem(value='c', est=1, upper=1, lower...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>col4</th>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.00015</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-85ba5e58-5b4c-497c-852b-d6c28abf1d8d')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-85ba5e58-5b4c-497c-852b-d6c28abf1d8d button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-85ba5e58-5b4c-497c-852b-d6c28abf1d8d');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-c72f1131-6f66-4f2f-86b2-d49df67626bc\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c72f1131-6f66-4f2f-86b2-d49df67626bc')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const charts = await google.colab.kernel.invokeFunction(\n",
              "          'suggestCharts', [key], {});\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-c72f1131-6f66-4f2f-86b2-d49df67626bc button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ],
      "source": [
        "from whylogs.core.resolvers import STANDARD_RESOLVER\n",
        "\n",
        "schema = DeclarativeSchema(STANDARD_RESOLVER)\n",
        "extra_metric = ResolverSpec(\n",
        "    column_name=\"col1\",\n",
        "    metrics=[\n",
        "        MetricSpec(StandardMetric.distribution.value),\n",
        "        MetricSpec(\n",
        "            ConditionCountMetric,\n",
        "            ConditionCountConfig(\n",
        "                conditions={\n",
        "                    \"below 42\": Condition(lambda x: x < 42),\n",
        "                    \"above 42\": Condition(lambda x: x > 42),\n",
        "                }\n",
        "            ),\n",
        "        ),\n",
        "    ],\n",
        ")\n",
        "schema.add_resolver(extra_metric)\n",
        "\n",
        "result = why.log(df, schema=schema)\n",
        "prof_view = result.profile().view()\n",
        "prof_view.to_pandas()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JGAkb97WVxSP"
      },
      "source": [
        "This example adds a condition count metric to `col1` in addition to the usual default metrics.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Default Resolver\n",
        "\n",
        "If you instantiate a `DeclarativeResolver` without passing it a list of `ResolverSpec`s, it will use the value of the variable `whylogs.core.resovlers.DEFAULT_RESOLVER`. Initially this has the value of `STANDARD_RESOLVER` which matches whylog's default behavior. You can set the value to one of the other pre-defined resolver lists or your own custom resolver list to customize the default resolving behavior.\n",
        "\n",
        "Similarly, there is a `whylogs.experimental.core.metrics.udf_metric.DEFAULT_UDF_RESOLVER` variable that specifies the default resolvers for the submetrics in a `UdfMetric`.\n",
        "\n",
        "## Excluding Metrics\n",
        "\n",
        "The `ResolverSpec` has an `exclude` field. If this is set to true, the metrics listed in the `ResolverSpec` are excluded from columns that match it. This can be handy for preventing sensitive information from \"leaking\" via a frequent items metric:"
      ],
      "metadata": {
        "id": "qXzLhIvtt0vF"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from whylogs.core.resolvers import DEFAULT_RESOLVER\n",
        "\n",
        "data = pd.DataFrame({\"Sensitive\": [\"private\", \"secret\"], \"Boring\": [\"normal\", \"stuff\"]})\n",
        "schema = DeclarativeSchema(\n",
        "    DEFAULT_RESOLVER + [ResolverSpec(\n",
        "        column_name = \"Sensitive\",\n",
        "        metrics = [MetricSpec(StandardMetric.frequent_items.value)],\n",
        "        exclude = True\n",
        "    )]\n",
        ")\n",
        "result = why.log(data, schema=schema)\n",
        "result.profile().view().to_pandas()[\"frequent_items/frequent_strings\"]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "laYXvD3GKQ1-",
        "outputId": "10635d62-c4d1-4513-d9f3-908ce022d680"
      },
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "column\n",
              "Boring       [FrequentItem(value='normal', est=1, upper=1, ...\n",
              "Sensitive                                                  NaN\n",
              "Name: frequent_items/frequent_strings, dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The frequent items metrics has been excluded from the `Sensitive` column without affecting the `DEFAULT_RESOLVER`'s treatment of other columns."
      ],
      "metadata": {
        "id": "RV-C3NkKNiRk"
      }
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3.8.10 ('.venv': poetry)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "orig_nbformat": 4,
    "vscode": {
      "interpreter": {
        "hash": "8430e7bcc333486e417258c6fadac662061ebd166d9f3c5ccb12c1968aa41625"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}