kengz/SLM-Lab

View on GitHub
TUTORIALS.md

Summary

Maintainability
Test Coverage
# Reinforcement Learning Tutorials

>Please bear in mind that the SLM Lab is written in PyTorch. Many of these tutorials use other neural network libraries such as Tensorflow. Feel free to reference them but we ask that code is written using PyTorch.

## Introductory

- Deep Reinforcement Learning, Pieter Abbeel. [video](https://www.youtube.com/watch?v=qaMdN6LS9rA), [slides](https://drive.google.com/file/d/0BxXI_RttTZAhVXBlMUVkQ1BVVDQ/view)
- [Deep Reinforcement Learning](https://www.youtube.com/watch?v=aUrX-rP_ss4), John Schulman.

## PyTorch

- [60 minute blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html), Soumith Chintala.

## Q-Learning

- [Let's make a DQN](https://jaromiru.com/2016/09/27/lets-make-a-dqn-theory/), Jaromír Janisch.

## Policy gradient

- [Learning Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/), Andrej Karpathy.
- [Asynchronous Advantage Actor Critic, A3C](https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2), Arthur Juliani.
- [Generalized Advantage Estimation](http://www.breloff.com/DeepRL-OnlineGAE/), Tom Breloff.

## Going further

- [Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/lectures)
- [CS294](https://www.youtube.com/playlist?list=PLkFD6_40KJIznC9CDbVTjAF2oyt8_VAe3), UC Berkeley, Sergey Levine. [Course page](http://rail.eecs.berkeley.edu/deeprlcourse/)
- David Silver's [RL course](https://www.youtube.com/watch?v=2pWv7GOvuf0&t=17s)
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf), Sutton and Barto. Canoncial textbook on RL.
- [Making sense of the bias-variance tradeoff in deep RL](https://medium.com/mlreview/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565), Arthur Juliani.

## Papers

- [DQN](https://arxiv.org/abs/1312.5602)
- [Double DQN](https://arxiv.org/abs/1509.06461) (DDQN)
- [Dueling DQN](https://arxiv.org/abs/1511.06581)
- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952) (PER)
- [Combined Experience Replay](https://arxiv.org/abs/1712.01275) (CER)
- [Hindsight Experience Replay](https://arxiv.org/abs/1707.01495) (HER)
- [QT-Opt](https://arxiv.org/abs/1806.10293)
- [Asynchronous Advantage Actor Critic](https://arxiv.org/abs/1602.01783) (A3C)
- [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438) (GAE)
- [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347) (PPO)
- [Self Imitation Learning](https://arxiv.org/abs/1806.05635) (SIL)

## SLM Lab tutorials

>If you write about using SLM Lab, feel free to let us know to include it below

- [Fast Implementation of Self-Imitation Learning](https://medium.com/@kengz/fast-implementation-of-self-imitation-learning-ffcd0b2b6c6b): a showcase of how an algorithm can be implemented quickly in the Lab.
- [Deep Reinforcement Learning with SLM Lab](https://medium.com/@kengz/deep-reinforcement-learning-with-slm-lab-bebf87f531ac): introduction to the features and demo run-through of v2.x.
- [pip module for RL agents in SLM Lab](https://medium.com/@kengz/pip-module-for-rl-agents-in-slm-lab-50e73872445d): how you can import SLM Lab as a pip module in your application to use the algorithms and components.
- [Multi-inheritance magic in SLM Lab](https://medium.com/@kengz/multi-inheritance-magic-in-slm-lab-35c666739b03): how the modular design of SLM Lab allows us to implement many things with little or no new code.