Nimish Sanghi
Bangalore, India
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/978-1-4842-6808-7 . For more detailed information, please visit www.apress.com/source-code .
ISBN 978-1-4842-6808-7 e-ISBN 978-1-4842-6809-4
https://doi.org/10.1007/978-1-4842-6809-4
Nimish Sanghi 2021
Apress Standard
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, Suite 4600, New York, NY 10004-1562, USA. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
Introduction
This book is about reinforcement learning and takes you from the basics through advanced topics. Though this book assumes no prior knowledge of the field of reinforcement learning, it expects you to be familiar with the basics of machine learning and specifically supervised learning. Have you coded in Python before? Are you comfortable working with libraries such as NumPy and scikit-learn? Have you heard of deep learning and explored basic build blocks of training simple models in PyTorch or TensorFlow? You should answer yes to all of these questions to get the best out of this book. If not, I suggest you review these concepts first; reviewing any introductory online tutorial or book from Apress on these topics would be sufficient.
This book walks you through the basics of reinforcement learning, spending lot of time explaining the concepts in initial chapters. If you have prior knowledge of reinforcement learning, you can go through the first four chapters at a fast pace. Starting in Chapter , the book picks up the pace as it starts exploring advanced topics that combine deep learning with reinforcement learning. The accompanying code hosted on GitHub forms an integral part of this book. While the book includes listings of the relevant code, Jupyter notebooks in the code repository provide additional insights and practical tips on coding these algorithms. You will be best served by reading the chapter and going through the explanations first and then working through the code in Jupyter notebooks. You are also encouraged to try to rewrite the code, training agents for different additional environments found in the OpenAI Gym library.
For a subject like this, math is unavoidable. However, we have tried our best to keep it minimal. The book quotes a lot of research papers that give short explanations of the approach taken. Readers wanting to gain a deeper understanding of the theory should go through these research papers. This books purpose is to introduce practitioners to the motivation and high-level approach behind many of the latest techniques in this field. However, by no means is it meant to provide a complete theoretical understanding of these techniques, which is best gained by reading the original papers.
The book is organized into ten chapters.
Chapter , Introduction to Deep Reinforcement Learning, introduces the topic, setting the background and motivating readers with real-world examples of how reinforcement learning is changing the landscape of intelligent machines. It also covers the installation of Python and related libraries so you can run the code accompanying this book.
Chapter , Markov Decision Processes, defines the problem in detail that we are trying to solve in the field of reinforcement learning. We talk about the agent and environment. We go in depth about what constitutes a reward, value functions, model, and policy. We look at various flavors of Markov processes. We establish the equations by Richard Bellman as part of dynamic programming.
Chapter , Model-Based Algorithms, focuses on the setup for a model and how the agent plans its action for optimal outcome. We also explore the OpenAI Gym environment library, which implements many of the common environments that we will use for coding and testing algorithms throughout the book. Finally, we explore value and policy iteration approaches to planning.
Chapter , Model-Free Approaches, talks about the model-free learning methods. Under this setup, the agent has no knowledge of the environment/model. It interacts with the environment and uses the rewards to learn an optimal policy through a trial-and-error approach. We specifically look at the Monte Carlo (MC) approach and the temporal difference (TD) approach to learning. We first study them individually and then combine the two under the concept of n-step returns and eligibility traces.
Chapter ) to being continuous. We study how to use parameterized functions to represent the state and bring scalability. First, we talk about the traditional approach of handcrafted function approximation, especially the linear approximators. Then, we introduce the concept of using a deep learningbased model as nonlinear function approximators.
Chapter , Deep Q-Learning, dives deep into DeepMind to successfully demonstrate how to use deep learning and reinforcement learning together to design agents that can learn to play video games such as Atari games. In this chapter, we explore how DQN works and what tweaks are required to make it learn. We then survey the various flavors of DQN, complete with detailed code examples, both in PyTorch and TensorFlow.
Chapter , Policy Gradient Algorithms, switches focus to explore the approach of learning a good policy directly in a model-free setup. The approaches in the preceding chapters are based on first learning value functions and then using these value functions to optimize the policy. In this chapter, we first talk about the theoretical foundations of the direct policy optimization approach. After establishing the foundations, we discuss various approaches including some very recent and highly successful algorithms, complete with implementations in PyTorch and TensorFlow.