zincware/ZnTrack

View on GitHub
examples/docs/07_functions.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "tags": []
   },
   "source": [
    "# Functions\n",
    "\n",
    "So far we have only considered converting a Python class into a ZnTrack Node.\n",
    "Whilst ZnTrack classes are the more powerful tool a lightweight alternative is wrapping a Python function with `@zntrack.nodify` to gain access to a subset of the available ZnTrack tools.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zntrack import config\n",
    "\n",
    "# When using ZnTrack we can write our code inside a Jupyter notebook.\n",
    "# We can make use of this functionality by setting the `nb_name` config as follows:\n",
    "config.nb_name = \"07_functions.ipynb\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [],
   "source": [
    "from zntrack.utils import cwd_temp_dir\n",
    "\n",
    "temp_dir = cwd_temp_dir()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized empty Git repository in /tmp/tmp0c_h9t_3/.git/\n",
      "Initialized DVC repository.\n",
      "\n",
      "You can now commit the changes to git.\n",
      "\n",
      "+---------------------------------------------------------------------+\n",
      "|                                                                     |\n",
      "|        DVC has enabled anonymous aggregate usage analytics.         |\n",
      "|     Read the analytics documentation (and how to opt-out) here:     |\n",
      "|             <https://dvc.org/doc/user-guide/analytics>              |\n",
      "|                                                                     |\n",
      "+---------------------------------------------------------------------+\n",
      "\n",
      "What's next?\n",
      "------------\n",
      "- Check out the documentation: <https://dvc.org/doc>\n",
      "- Get help and share ideas: <https://dvc.org/chat>\n",
      "- Star us on GitHub: <https://github.com/iterative/dvc>\n"
     ]
    }
   ],
   "source": [
    "!git init\n",
    "\n",
    "!dvc init"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following example we will create an output file and write some parameters to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zntrack import nodify, NodeConfig\n",
    "import pathlib\n",
    "\n",
    "\n",
    "@nodify(outs=pathlib.Path(\"outs.txt\"), params={\"text\": \"Lorem Ipsum\"})\n",
    "def write_text(cfg: NodeConfig):\n",
    "    cfg.outs.write_text(cfg.params.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `@nodify` allows us to define all available DVC run options such as `outs` or `deps` together with a parameter dictionary.\n",
    "The params are cast into a `DotDict` which allows us to access them either via `cfg.params[\"text\"]` or directly via `cfg.params.text`.\n",
    "Running the function will only create the Node for us and not execute the function. We can circumvent that by telling DVC to run the method via `run=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Jupyter support is an experimental feature! Please save your notebook before running this command!\n",
      "Submit issues to https://github.com/zincware/ZnTrack.\n",
      "[NbConvertApp] Converting notebook 07_functions.ipynb to script\n",
      "[NbConvertApp] Writing 2495 bytes to 07_functions.py\n",
      "Running DVC command: 'stage add -n write_text --force ...'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating 'dvc.yaml'\n",
      "Adding stage 'write_text' in 'dvc.yaml'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.yaml .gitignore\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running DVC command: 'repro write_text'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running stage 'write_text':\n",
      "> zntrack run src.write_text.write_text\n",
      "Generating lock file 'dvc.lock'\n",
      "Updating lock file 'dvc.lock'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.lock\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n",
      "Use `dvc push` to send your updates to remote storage.\n"
     ]
    }
   ],
   "source": [
    "cfg = write_text(run=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Lorem Ipsum'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cfg.outs.read_text()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "This also allows us to build DAGs by adding the output files as dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "@nodify(\n",
    "    deps=pathlib.Path(\"outs.txt\"),\n",
    "    outs=[pathlib.Path(\"part_1.txt\"), pathlib.Path(\"part_2.txt\")],\n",
    ")\n",
    "def split_text(cfg: NodeConfig):\n",
    "    text = cfg.deps.read_text()\n",
    "    for text_part, outs_file in zip(text.split(\" \"), cfg.outs):\n",
    "        outs_file.write_text(text_part)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[NbConvertApp] Converting notebook 07_functions.ipynb to script\n",
      "[NbConvertApp] Writing 2495 bytes to 07_functions.py\n",
      "Running DVC command: 'stage add -n split_text --force ...'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adding stage 'split_text' in 'dvc.yaml'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.yaml .gitignore\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running DVC command: 'repro split_text'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stage 'write_text' didn't change, skipping\n",
      "Running stage 'split_text':\n",
      "> zntrack run src.split_text.split_text\n",
      "Updating lock file 'dvc.lock'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.lock\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n",
      "Use `dvc push` to send your updates to remote storage.\n"
     ]
    }
   ],
   "source": [
    "_ = split_text(run=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Lorem\n",
      "Ipsum\n"
     ]
    }
   ],
   "source": [
    "print(pathlib.Path(\"part_1.txt\").read_text())\n",
    "print(pathlib.Path(\"part_2.txt\").read_text())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Pros and Cons\n",
    "\n",
    "Wrapping a Python function and converting it into Node is closer to the original DVC API. It provides all the basic functionality and can be nicely applied to compact methods.\n",
    "The ZnTrack class API provides more powerful tools such as the `zn.<method>` and can be used without configuring any file names.\n",
    "Personal preferences allow everyone to use either method or combine them to get maximum benefit from ZnTrack and DVC."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}