python/examples/advanced/Custom_Metrics.ipynb from whylabs/whylogs-python

python/examples/advanced/Custom_Metrics.ipynb
Summary

Maintainability

Test Coverage

Issues
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> \n",
        ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Custom_Metrics)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Custom_Metrics) to leverage the power of whylogs and WhyLabs together!*"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uNhjtINB2EaL"
      },
      "source": [
        "# Custom Metrics\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/advanced/Custom_Metrics.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WT8XU-Df065F"
      },
      "outputs": [],
      "source": [
        "# Note: you may need to restart the kernel to use updated packages.\n",
        "%pip install whylogs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j4zsdhVU2yc2"
      },
      "source": [
        "If all of the state of the metric can be represented by subclasses of `MetricComponent`, it's very simple to create a new metric. There are a number of standard metric components in [metric_components.py](https://github.com/whylabs/whylogs/blob/mainline/python/whylogs/core/metrics/metric_components.py). You can also create new components by subclassing `CustomComponent`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "OGF-g7wA50Lt"
      },
      "outputs": [],
      "source": [
        "from dataclasses import dataclass\n",
        "from typing import Any, Dict, List\n",
        "import pickle\n",
        "\n",
        "import whylogs_sketching as ds  # type: ignore\n",
        "from whylogs.core.configs import SummaryConfig\n",
        "from whylogs.core.dataset_profile import DatasetProfile\n",
        "from whylogs.core.datatypes import DataType\n",
        "from whylogs.core.metrics.metric_components import KllComponent\n",
        "from whylogs.core.metrics.metrics import CustomMetricBase, Metric, MetricConfig, OperationResult\n",
        "from whylogs.core.preprocessing import PreprocessedColumn\n",
        "from whylogs.core.resolvers import Resolver\n",
        "from whylogs.core.schema import DatasetSchema\n",
        "from whylogs.core.preprocessing import PreprocessedColumn\n",
        "from whylogs.core.proto import MetricMessage, MetricComponentMessage\n",
        "\n",
        "\n",
        "# Metric classes should be decorated with @dataclass\n",
        "@dataclass(frozen=True)\n",
        "class HistogramMetric(Metric):\n",
        "    histogram: KllComponent  # All the fields are subclasses of MetricComonent\n",
        "\n",
        "    # you must implement namespace returning a unique string to identify your metric\n",
        "    @property\n",
        "    def namespace(self) -> str:\n",
        "        return \"histogram\"\n",
        "\n",
        "    # you must implement to_summary_dict returning a summary of your metric\n",
        "    def to_summary_dict(self, cfg: SummaryConfig) -> Dict[str, Any]:\n",
        "        if self.histogram.value.get_n() == 0:\n",
        "            quantiles = [None, None, None, None, None]\n",
        "        else:\n",
        "            quantiles = self.histogram.value.get_quantiles([0.1, 0.25, 0.5, 0.75, 0.9])\n",
        "        return {\n",
        "            \"n\": self.histogram.value.get_n(),\n",
        "            \"max\": self.histogram.value.get_max_value(),\n",
        "            \"min\": self.histogram.value.get_min_value(),\n",
        "            \"q_10\": quantiles[0],\n",
        "            \"q_25\": quantiles[1],\n",
        "            \"median\": quantiles[2],\n",
        "            \"q_75\": quantiles[3],\n",
        "            \"q_90\": quantiles[4],\n",
        "        }\n",
        "\n",
        "    # columnar_update updates your metric as data is logged\n",
        "    def columnar_update(self, data: PreprocessedColumn) -> OperationResult:\n",
        "        successes = 0\n",
        "\n",
        "        if data.numpy.len > 0:\n",
        "            for arr in [data.numpy.floats, data.numpy.ints]:\n",
        "                if arr is not None:\n",
        "                    self.histogram.value.update(arr)\n",
        "\n",
        "        for lst in [data.list.ints, data.list.floats]:\n",
        "            if lst is not None and len(lst) > 0:\n",
        "                self.histogram.value.update_list(lst)\n",
        "\n",
        "        return OperationResult.ok(successes)\n",
        "\n",
        "    # The zero method returns an \"empty\" instance of your metric ready to start tracking data\n",
        "    # If your metric needs configuration, create a subclass of MetricConfig containing your\n",
        "    # parameters.\n",
        "    @classmethod\n",
        "    def zero(cls, config: MetricConfig) -> \"HistogramMetric\":\n",
        "        return cls(histogram=KllComponent(ds.kll_doubles_sketch(k=config.kll_k)))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DLXIn7m471TF"
      },
      "source": [
        "If you prefer not to use `MetricComponent` fields for your metric, you can instead make your metric a subclass of `CustomMetricBase`. All fields whose names don't start with `_` will be included in the metric summary and serialized via protobuf."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "8ie0jJoI9FHv"
      },
      "outputs": [],
      "source": [
        "@dataclass\n",
        "class StructMetric(CustomMetricBase):\n",
        "    x: int\n",
        "    s: str\n",
        "    _private: float = 3.14159  # excluded from summary and protobuf\n",
        "\n",
        "    @property\n",
        "    def namespace(self) -> str:\n",
        "        return \"struct\"\n",
        "\n",
        "    # you must implement your own merge method\n",
        "    def merge(self, other: \"StructMetric\") -> \"StructMetric\":\n",
        "        return StructMetric(self.x + other.x, self.s + other.s)\n",
        "\n",
        "    def columnar_update(self, data: PreprocessedColumn) -> OperationResult:\n",
        "        self.x += 1\n",
        "        self.s += \"a\"\n",
        "        return OperationResult.ok(1)\n",
        "\n",
        "    @classmethod\n",
        "    def zero(cls, config: MetricConfig) -> \"StructMetric\":\n",
        "        return cls(0, \"\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-zZnKNCF9oUj"
      },
      "source": [
        "## Using Your Metric\n",
        "You will need to create a `Resolver` and `DatasetSchema` in order to use your metric."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "HlFNH-H6-qKg"
      },
      "outputs": [],
      "source": [
        "from whylogs.core import ColumnSchema\n",
        "\n",
        "class TestResolver(Resolver):\n",
        "    def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:\n",
        "        return {\"histogram\": HistogramMetric.zero(column_schema.cfg),\n",
        "                \"struct\": StructMetric(0, \"\")}\n",
        "\n",
        "\n",
        "schema = DatasetSchema(types={\"col1\": float}, resolvers=TestResolver())\n",
        "prof = DatasetProfile(schema)\n",
        "row = {\"col1\": 1.2}\n",
        "prof.track(row=row)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>histogram/n</th>\n",
              "      <th>histogram/max</th>\n",
              "      <th>histogram/min</th>\n",
              "      <th>histogram/q_10</th>\n",
              "      <th>histogram/q_25</th>\n",
              "      <th>histogram/median</th>\n",
              "      <th>histogram/q_75</th>\n",
              "      <th>histogram/q_90</th>\n",
              "      <th>struct/x</th>\n",
              "      <th>struct/s</th>\n",
              "      <th>type</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>column</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>col1</th>\n",
              "      <td>1</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>1.2</td>\n",
              "      <td>2</td>\n",
              "      <td>aa</td>\n",
              "      <td>SummaryType.COLUMN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "        histogram/n  histogram/max  histogram/min  histogram/q_10  \\\n",
              "column                                                              \n",
              "col1              1            1.2            1.2             1.2   \n",
              "\n",
              "        histogram/q_25  histogram/median  histogram/q_75  histogram/q_90  \\\n",
              "column                                                                     \n",
              "col1               1.2               1.2             1.2             1.2   \n",
              "\n",
              "        struct/x struct/s                type  \n",
              "column                                         \n",
              "col1           2       aa  SummaryType.COLUMN  "
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "prof.view().to_pandas()"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "Untitled0.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "vscode": {
      "interpreter": {
        "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}