whylabs/whylogs-python

View on GitHub
python/examples/experimental/Condition_Validator_UDF.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> \n",
    ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Condition_Validators_udf)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Condition_Validators_udf) to leverage the power of whylogs and WhyLabs together!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Condition Validator UDFs\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/experimental/Condition_Validator_UDF.ipynb)\n",
    "\n",
    "In this example, we will show how you can create condition validators in a simplified way by using the `condition_validator` decorator. This will allow you to easily create a condition validator based on a user-defined function (UDF).\n",
    "\n",
    "## Example\n",
    "\n",
    "Let's say you are logging a numerical column `col1`, and you want to trigger an action whenever the evaluated row value for this column is greater than 4. To do so, we'll define two functions: an action and a condition. We will then decorate the condition function with the `condition_validator` decorator, and pass the action function as an argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: you may need to restart the kernel to use updated packages.\n",
    "%pip install whylogs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "No session found. Call whylogs.init() to initialize a session and authenticate. See https://docs.whylabs.ai/docs/whylabs-whylogs-init for more information.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Validator: less_than_four\n",
      "    Condition name less_than_four failed for value 7\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7fe2a606a520>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from typing import Any\n",
    "from whylogs.experimental.core.validators import condition_validator\n",
    "from whylogs.experimental.core.udf_schema import udf_schema\n",
    "import whylogs as why\n",
    "\n",
    "data = pd.DataFrame({\"col1\": [1, 3, 7]})\n",
    "\n",
    "def do_something_important(validator_name, condition_name: str, value: Any, column_id=None):\n",
    "    print(\"Validator: {}\\n    Condition name {} failed for value {}\".format(validator_name, condition_name, value))\n",
    "    return\n",
    "\n",
    "@condition_validator([\"col1\"], condition_name=\"less_than_four\", actions=[do_something_important])\n",
    "def lt_4(x):\n",
    "    return x < 4\n",
    "\n",
    "schema = udf_schema()\n",
    "why.log(data, schema=schema).view()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that the action was triggered once for the value 7.\n",
    "\n",
    "Condition Validators are compatible with Dataset UDFs. Through Dataset UDFs, you can create new columns based on the values of other columns. In this example, we will create a new column `add5` that is equal to `col1` + 5. We will then assign a condition validator to the newly created column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Validator: less_than_four\n",
      "    Condition name less_than_four failed for value 7\n",
      "Validator: less_than_four\n",
      "    Condition name less_than_four failed for value 6\n",
      "Validator: less_than_four\n",
      "    Condition name less_than_four failed for value 8\n",
      "Validator: less_than_four\n",
      "    Condition name less_than_four failed for value 12\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7fe3038f8040>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from typing import Dict, List, Union\n",
    "from whylogs.experimental.core.udf_schema import register_dataset_udf\n",
    "\n",
    "\n",
    "@register_dataset_udf([\"col1\"])\n",
    "def add5(x: Union[Dict[str, List], pd.DataFrame]) -> Union[List, pd.Series]:\n",
    "    return [xx + 5 for xx in x[\"col1\"]]\n",
    "\n",
    "@condition_validator([\"add5\"], condition_name=\"less_than_four\", actions=[do_something_important])\n",
    "def lt_4(x):\n",
    "    return x < 4\n",
    "\n",
    "schema = udf_schema()\n",
    "why.log(data, schema=schema).view()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, our action was triggered 4 times: once for `col1`'s value 7, and 3 times for `add5`'s values 6, 8 and 12.\n",
    "\n",
    "You can access the assigned condition validators through the schema object. In the following code snippet, we can see that there's one condition validator assigned to `col1` and one to `add5`, both being named `less_than_four`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(list,\n",
       "            {'col1': [ConditionValidator(name='less_than_four', conditions={'less_than_four': <function lt_4 at 0x7fe2a6059af0>}, actions=[<function do_something_important at 0x7fe2ea7c6d30>], total=0, failures={'less_than_four': 2}, enable_sampling=True, _samples=[], _sampler=<whylogs_sketching.var_opt_sketch object at 0x7fe2a6067130>, sample_size=10)],\n",
       "             'add5': [ConditionValidator(name='less_than_four', conditions={'less_than_four': <function lt_4 at 0x7fe2a4d531f0>}, actions=[<function do_something_important at 0x7fe2ea7c6d30>], total=0, failures={'less_than_four': 3}, enable_sampling=True, _samples=[], _sampler=<whylogs_sketching.var_opt_sketch object at 0x7fe2a4d461b0>, sample_size=10)]})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "schema.validators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get a sample of the data that failed the condition. Let's do that for the first (and only) condition validator for the `add5` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[6, 8, 12]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "schema.validators[\"add5\"][0].get_samples()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}