docs/source/examples/getting_started.rst from awarebayes/RecNN

docs/source/examples/getting_started.rst
Summary

Maintainability

Test Coverage

Issues
Getting Started with recnn
==========================

Colab Version Here (clickable):

.. image:: https://colab.research.google.com/assets/colab-badge.svg
 :target: https://colab.research.google.com/drive/1xWX4JQvlcx3mizwL4gB0THEyxw6LsXTL


Offline example is in: RecNN/examples/[Library Basics]/1. Getting Started.ipynb

Let's do some imports::

    import recnn

    import recnn
    import torch
    import torch.nn as nn
    from tqdm.auto import tqdm

    tqdm.pandas()

    from jupyterthemes import jtplot
    jtplot.style(theme='grade3')

Environments
++++++++++++
Main abstraction of the library for datasets is called environment, similar to how other reinforcement learning libraries name it. This interface is created to provide SARSA like input for your RL Models. When you are working with recommendation env, you have two choices: using static length inputs (say 10 items) or dynamic length time series with sequential encoders (many to one rnn). Static length is provided via FrameEnv, and dynamic length along with sequential state representation encoder is implemented in SeqEnv. Let’s take a look at FrameEnv first:

In order to initialize an env, you need to provide embeddings and ratings directories::

    frame_size = 10
    batch_size = 25
    # embeddgings: https://drive.google.com/open?id=1EQ_zXBR3DKpmJR3jBgLvt-xoOvArGMsL
    dirs = recnn.data.env.DataPath(
        base="../../../data/",
        embeddings="embeddings/ml20_pca128.pkl",
        ratings="ml-20m/ratings.csv",
        cache="cache/frame_env.pkl", # cache will generate after you run
        use_cache=True
    )
    env = recnn.data.env.FrameEnv(dirs, frame_size, batch_size)

    train = env.train_batch()
    test = env.train_batch()
    state, action, reward, next_state, done = recnn.data.get_base_batch(train, device=torch.device('cpu'))

    print(state)

    # State
    tensor([[  5.4261,  -4.6243,   2.3351,  ...,   3.0000,   4.0000,   1.0000],
        [  6.2052,  -1.8592,  -0.3248,  ...,   4.0000,   1.0000,   4.0000],
        [  3.2902,  -5.0021, -10.7066,  ...,   1.0000,   4.0000,   2.0000],
        ...,
        [  3.0571,  -4.1390,  -2.7344,  ...,   3.0000,  -3.0000,  -1.0000],
        [  0.8177,  -7.0827,  -0.6607,  ...,  -3.0000,  -1.0000,   3.0000],
        [  9.0742,   0.3944,  -6.4801,  ...,  -1.0000,   3.0000,  -1.0000]])

Recommending
++++++++++++

Let's initialize main networks, and recommend something! ::

    value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
    policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)

    recommendation = policy_net(state)
    value = value_net(state, recommendation)
    print(recommendation)
    print(value)

    # Output:

    tensor([[ 1.5302, -2.3658,  1.6439,  ...,  0.1297,  2.2236,  2.9672],
        [ 0.8570, -1.3491, -0.3350,  ..., -0.8712,  5.8390,  3.0899],
        [-3.3727, -3.6797, -3.9109,  ...,  3.2436,  1.2161, -1.4018],
        ...,
        [-1.7834, -0.4289,  0.9808,  ..., -2.3487, -5.8386,  3.5981],
        [ 2.3813, -1.9076,  4.3054,  ...,  5.2221,  2.3165, -0.0192],
        [-3.8265,  1.8143, -1.8106,  ...,  3.3988, -3.1845,  0.7432]],
       grad_fn=<AddmmBackward>)
    tensor([[-1.0065],
            [ 0.3728],
            [ 2.1063],
            ...,
            [-2.1382],
            [ 0.3330],
            [ 5.4069]], grad_fn=<AddmmBackward>)

Algo classes
++++++++++++

Algo is a high level abstraction for an RL algorithm. You need two networks
(policy and value) in order to initialize it. Later on you can tweak parameters
and stuff in the algo itself.

Important: you can set writer to torch.SummaryWriter and get the debug output
Tweak how you want::

    ddpg = recnn.nn.DDPG(policy_net, value_net)
    print(ddpg.params)
    ddpg.params['gamma'] = 0.9
    ddpg.params['policy_step'] = 3
    ddpg.optimizers['policy_optimizer'] = torch.optim.Adam(ddpg.nets['policy_net'], your_lr)
    ddpg.writer = torch.utils.tensorboard.SummaryWriter('./runs')
    ddpg = ddpg.to(torch.device('cuda'))

ddpg.loss_layout is also handy, it allows you to see how the loss should look like ::

    # test function
    def run_tests():
        batch = next(iter(env.test_dataloader))
        loss = ddpg.update(batch, learn=False)
        return loss

    value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
    policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)

    cuda = torch.device('cuda')
    ddpg = recnn.nn.DDPG(policy_net, value_net)
    ddpg = ddpg.to(cuda)
    plotter = recnn.utils.Plotter(ddpg.loss_layout, [['value', 'policy']],)
    ddpg.writer = SummaryWriter(dir='./runs')

    from IPython.display import clear_output
    import matplotlib.pyplot as plt
    %matplotlib inline

    plot_every = 50
    n_epochs = 2

    def learn():
        for epoch in range(n_epochs):
            for batch in tqdm(env.train_dataloader):
                loss = ddpg.update(batch, learn=True)
                plotter.log_losses(loss)
                ddpg.step()
                if ddpg._step % plot_every == 0:
                    clear_output(True)
                    print('step', ddpg._step)
                    test_loss = run_tests()
                    plotter.log_losses(test_loss, test=True)
                    plotter.plot_loss()
                if ddpg._step > 1000:
                    return

    learn()

Update Functions
++++++++++++++++

Basically, the Algo class is a high level wrapper around the update function. The code for that is pretty messy,
so if you want to check it out, I explained it in the colab notebook linked at the top.