zincware/ZnTrack

View on GitHub
examples/docs/01_Intro.ipynb

Summary

Maintainability
Test Coverage
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Overview\n",
    "\n",
    "ZnTrack is a user-friendly framework that simplifies the creation and tracking of experiments.\n",
    "It's built on top of DVC, a powerful tool for version controlling machine learning projects.\n",
    "If you're not familiar with DVC, we highly recommend reading the [Getting Started guide](https://dvc.org/doc/start) to learn more about it.\n",
    "\n",
    "While DVC provides all the necessary functionality, it was designed to be language independent.\n",
    "This often requires writing custom Python scripts, managing dependencies, and working with configuration files.\n",
    "ZnTrack addresses these challenges by providing a Python-specific interface that's easy to use and well-integrated with Python workflows.\n",
    "\n",
    "Just like Git was originally designed to serve as a low-level version control system engine, on top of which others could build front ends, ZnTrack was designed to build on top of DVC for Python.\n",
    " By doing so, it provides a more feature-rich, user-friendly interface that's optimized for Python developers.\n",
    " You can think of it as similar to using Django or SQLAlchemy to make working with SQL easier and more tailored to Python.\n",
    " With ZnTrack, you can streamline the steps involved in experiment tracking and management, and enjoy a more streamlined workflow that's optimized for Python developers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Jupyter Notebook Support\n",
    "\n",
    "ZnTrack can extract Nodes defined in Jupyter Notebooks.\n",
    "It will try to extract the Node definition and write it into a python file.\n",
    "Therefore, it needs to know the name of the notebook.\n",
    "\n",
    "For more complex workflows, it is recommended to define the Nodes inside Python files and import them into Jupyter Notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zntrack import config\n",
    "\n",
    "# When using ZnTrack we can write our code inside a Jupyter notebook.\n",
    "# We can make use of this functionality by setting the `nb_name` config as follows:\n",
    "config.nb_name = \"01_Intro.ipynb\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "Every project starts inside an empty directory.\n",
    "We can initialize a new project by running `dvc init` and `git init` inside the directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from zntrack.utils import cwd_temp_dir\n",
    "\n",
    "temp_dir = cwd_temp_dir()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized empty Git repository in /tmp/tmpkyrcn10i/.git/\n",
      "Initialized DVC repository.\n",
      "\n",
      "You can now commit the changes to git.\n",
      "\n",
      "+---------------------------------------------------------------------+\n",
      "|                                                                     |\n",
      "|        DVC has enabled anonymous aggregate usage analytics.         |\n",
      "|     Read the analytics documentation (and how to opt-out) here:     |\n",
      "|             <https://dvc.org/doc/user-guide/analytics>              |\n",
      "|                                                                     |\n",
      "+---------------------------------------------------------------------+\n",
      "\n",
      "What's next?\n",
      "------------\n",
      "- Check out the documentation: <https://dvc.org/doc>\n",
      "- Get help and share ideas: <https://dvc.org/chat>\n",
      "- Star us on GitHub: <https://github.com/iterative/dvc>\n"
     ]
    }
   ],
   "source": [
    "!git init\n",
    "\n",
    "!dvc init"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Nodes\n",
    "\n",
    "In DVC, a pipeline is organized into multiple stages, which can be created by inheriting from ``zntrack.Node`` and implementing a ``run()`` method.\n",
    "\n",
    "The ``run()`` method defines the logic of your pipeline stage, which will later be executed by our pipeline manager (e.g. ``dvc repro``).\n",
    "\n",
    "As an example, let's create a ``RandomNumber`` Node that generates a random integer between 0 and a parameterized maximum value. To do this, we'll use the zntrack module to define our Node's inputs and outputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zntrack import Node, zn, Project\n",
    "from random import randrange\n",
    "\n",
    "\n",
    "class RandomNumber(Node):\n",
    "    number = zn.outs()\n",
    "    maximum = zn.params()\n",
    "\n",
    "    def run(self):\n",
    "        self.number = randrange(self.maximum)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ZnTrack class generates an `__init__` method for all `zn.params` and other inputs automatically.\n",
    "When writing a custom `__init__` it is important to add `super().__init__(**kwargs)` for ZnTrack to work.\n",
    "```python\n",
    "class RandomNumber(Node):\n",
    "    def __init__(self, maximum=None, **kwargs):\n",
    "        super().__init__(**kwargs)\n",
    "        self.maximum = maximum\n",
    "```\n",
    "\n",
    "For most cases the ZnTrack node just behaves like a normal python class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "14\n"
     ]
    }
   ],
   "source": [
    "random_number = RandomNumber(maximum=512)\n",
    "random_number.run()\n",
    "print(random_number.number)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To add the Node to the DVC pipeline we can employ a context manager and use `project.run()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running DVC command: 'stage add --name RandomNumber --force ...'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating 'dvc.yaml'\n",
      "Adding stage 'RandomNumber' in 'dvc.yaml'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.yaml nodes/RandomNumber/.gitignore\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Jupyter support is an experimental feature! Please save your notebook before running this command!\n",
      "Submit issues to https://github.com/zincware/ZnTrack.\n",
      "[NbConvertApp] Converting notebook 01_Intro.ipynb to script\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running stage 'RandomNumber':\n",
      "> zntrack run src.RandomNumber.RandomNumber --name RandomNumber\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[NbConvertApp] Writing 4644 bytes to 01_Intro.py\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating lock file 'dvc.lock'\n",
      "Updating lock file 'dvc.lock'\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add dvc.lock\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n",
      "Use `dvc push` to send your updates to remote storage.\n"
     ]
    }
   ],
   "source": [
    "with Project() as project:\n",
    "    node = RandomNumber(maximum=512)\n",
    "\n",
    "project.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To gain access to the results we can load the Node via the classmethod `load()` and look at the number attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "354"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "node.load()\n",
    "node.number"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Instead of passing parameters you can also pass a parameter file (A list of all supported files, e.g. json/yaml can be found in the documentation [DVC Params](https://dvc.org/doc/command-reference/params)).\n",
    "To do so you can use `zntrack.dvc.params(<param_file>)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [],
   "source": [
    "temp_dir.cleanup()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}