python/examples/basic/Getting_Started_with_UDFs.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> \n",
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started) to leverage the power of whylogs and WhyLabs together!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Getting_Started_with_UDFs.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, we'll explore the basics of logging data with whylogs and a user defined function or UDF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installing whylogs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install whylogs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading a Pandas DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"data = {\n",
" \"animal\": [\"cat\", \"hawk\", \"clam\", \"cat\", \"mongoose\", \"octopus\"],\n",
" \"legs\": [4, 2, 1, 4, 4, 8],\n",
" \"weight\": [4.3, 1.8, 1.3, 4.1, 5.4, 3.2],\n",
"}\n",
"\n",
"df = pd.DataFrame(data)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining a simple metric UDF\n",
"Here we use a metric UDF targeting a named column `animal` as an example to show how we can add features to a dataframe for custom monitoring. In this example we model some custom logic for if the animal has a cool name. This is a toy example that just checks if the name is longer than 4 characters, and does a binary classification, but you could return a score based on values in a column too."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import whylogs as why\n",
"from whylogs.experimental.core.udf_schema import udf_schema\n",
"from whylogs.experimental.core.metrics.udf_metric import register_metric_udf\n",
"\n",
"\n",
"@register_metric_udf(col_name=\"animal\")\n",
"def has_cool_animal_name(text):\n",
" if len(text) > 4: # long names are cool\n",
" return 1\n",
" else:\n",
" return 0\n",
" \n",
"custom_schema = udf_schema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Profiling with whylogs + UDFs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To obtain a profile of your data, you can simply use whylogs' `log` call with your UDF schema defined earlier. This will attach a feature named `animal.has_cool_animal_name` which you can then see in WhyLabs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import whylogs as why\n",
"\n",
"why.init()\n",
"\n",
"results = why.log(df, name=\"udf_demo\", schema=custom_schema)\n",
"results.view().to_pandas()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Going Further with UDFs\n",
"Unlike metric UDFs, **dataset UDFs** can take multiple columns as input. Dataset UDFs create a new column in your pandas dataframe, which then is profiled along with your inputs."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from whylogs.experimental.core.udf_schema import register_dataset_udf\n",
"import pandas as pd\n",
"\n",
"@register_dataset_udf([\"legs\", \"weight\"])\n",
"def weight_per_leg(data: pd.DataFrame) -> pd.Series:\n",
" return data[\"weight\"] / data[\"legs\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_schema2 = udf_schema()\n",
"results = why.log(df, schema=custom_schema2)\n",
"results.view().to_pandas()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more details on the different kinds of UDFs (say you wanted to calculate a score based on multiple columns) see this example:\n",
"* https://github.com/whylabs/whylogs/blob/mainline/python/examples/experimental/whylogs_UDF_examples.ipynb"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}