docs/guide/batch.rst from lenskit/lkpy

docs/guide/batch.rst
Summary

Maintainability

Test Coverage

Issues
.. _batch:

Batch-Running Pipelines
=======================

.. py:currentmodule:: lenskit.batch

.. highlight:: python

Offline recommendation experiments require *batch-running* a pipeline over a set
of test users, sessions, or other recommendation requests.  LensKit supports this
through the facilities in the :py:mod:`lenskit.batch` module.

By default, the batch facilities operate in parallel over the test users; this
can be controlled by environment variables (see :ref:`parallel-config`) or
through an ``n_jobs`` keyword argument to the various functions and classes.

.. admonition:: Import Protection
    :class: important

    Scripts using batch pipeline operations must be *protected*; see
    :ref:`parallel-protecting`.

Simple Runs
-----------

If you have a pipeline and want to simply generate recommendations for a batch
of test users, you can do this with the :py:func:`recommend` function.

For an example, let's start with importing things to run a quick batch:

    >>> from lenskit.basic import PopScorer
    >>> from lenskit.pipeline import topn_pipeline
    >>> from lenskit.batch import recommend
    >>> from lenskit.data import load_movielens
    >>> from lenskit.splitting import sample_users, SampleN
    >>> from lenskit.metrics import RunAnalysis, RBP

Load and split some data:

    >>> data = load_movielens('data/ml-100k.zip')
    >>> split = sample_users(data, 150, SampleN(5, rng=1024), rng=42)

Configure and train the model:

    >>> model = PopScorer()
    >>> pop_pipe = topn_pipeline(model, n=20)
    >>> pop_pipe.train(split.train)

Generate recommendations:

    >>> recs = recommend(pop_pipe, split.test.keys(), n_jobs=1)
    >>> recs.to_df()
              user_id  item_id     score  rank
    0 ...                                    1
    ...
    [3000 rows x 4 columns]

And measure their results:

    >>> measure = RunAnalysis()
    >>> measure.add_metric(RBP())
    >>> scores = measure.compute(recs, split.test)
    >>> scores.list_summary()    # doctest: +ELLIPSIS
              mean    median     std
    metric
    RBP    0.09...    0.0...  0.1...


The :py:func:`predict` function works similarly, but for rating predictions;
instead of a simple list of user IDs, it takes a dictionary mapping user IDs to
lists of test items (as :py:class:`~lenskit.data.ItemList`).

General Batch Pipeline Runs
---------------------------

The :py:func:`recommend` and :py:func:`predict` functions are convenience
wrappers around a more general facility, the :py:class:`BatchPipelineRunner`.