LitArk » Books » Computer

Sayon Dutta - Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym

Here you can read online Sayon Dutta - Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym
Author:
Sayon Dutta
Publisher:
Packt Publishing
Genre:
Books / Computer
Year:
2018
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Leverage the power of reinforcement learning techniques to develop self-learning systems using TensorFlow

Key Features

Explore reinforcement learning concepts and their implementation using TensorFlow
Discover different problem-solving methods for reinforcement learning
Apply reinforcement learning to autonomous driving cars, robobrokers, and more

Book Description

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. Its an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions.

The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. Youll also be introduced to the concept of reinforcement learning, its advantages and the reasons why its gaining so much popularity. Youll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP.

By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

What you will learn

Implement state-of-the-art reinforcement learning algorithms from the basics
Discover various reinforcement learning techniques such as MDP, Q Learning, and more
Explore the applications of reinforcement learning in advertisement, image processing, and NLP
Teach a reinforcement learning model to play a game using TensorFlow and OpenAI Gym
Understand how reinforcement learning applications are used in robotics

Who This Book Is For

If you want to get started with reinforcement learning using TensorFlow in the most practical way, this book will be a useful resource. The book assumes prior knowledge of machine learning and neural network programming concepts, as well as some understanding of the TensorFlow framework. No previous experience of reinforcement learning is required.

Table of Contents

Deep Learning - Architectures and Frameworks
Training Reinforcement Learning Agents Using OpenAI Gym
Markov Decision Process (MDP)
Policy Gradients
Q-Learning & Deep Q Networks
Asynchronous Methods
Robo Everything - Real Strategy Gaming
AlphaGo - Reinforcement Learning at its Best
Reinforcement Learning in Autonomous Driving
Financial Portfolio Management
Reinforcement Learning in Robotics
Deep Reinforcement Learning in AdTech
Reinforcement Learning in Image Processing
Deep Reinforcement Learning in NLP
Appendix 1.Further Topics in Reinforcement Learning

Sayon Dutta: author's other books

Who wrote Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym? Find out the surname, the name of the author of the book and a list of all author's works by series.

Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

TD() rule

The TD(1) and TD(0) rules give rise to a generalized rule TD( ) that is TD (lambda), such that for and should satisfy the following conditions:

If =0, TD() tends to TD(0)
If =1, TD() tends to TD(1)

Both TD(0) and TD(1) have updates based on differences between temporally successive predictions.

Therefore, the pseudo code of TD() is as follows:

Episode T
For all s, At the start of the episode : e(s) = 0 and
After : (at step t)

For all s,

This satisfies the preceding two conditions and can incorporate any value for .

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Further topics in Reinforcement Learning

In this appendix, we will cover an introductory overview of some of the topics which were out of the scope of this book. But we will mention them in brief and end these topics with external links for you to explore further. This book as has already covered most of the advanced topics in deep reinforcement learning theory as well as active research domains.

Using a baseline to reduce variance

In addition to our initial effort to use an actor-critic method to reduce variance, we can also reduce variance by subtracting a baseline function from the policy gradient. This will reduce the variance without affecting the expectation value as shown in the following:

There are many options to choose a baseline function but state value function is regarded to be a good baseline function. Therefore:

Thus, we can rewrite the policy gradient formula by subtracting the baseline function as follows:

Here, is termed the advantage function . Therefore, the policy gradient formula becomes the following:

Thus, by using a baseline function the expected value is under control by lowered variance without any change in the direction.

Asynchronous one-step Q-learning

The architecture of asynchronous one-step Q-learning is very similar to DQN. An agent in DQN was represented by a set of primary and target networks, where one-step loss is calculated as the square of the difference between the state-action value of the current state s predicted by the primary network, and the target state-action value of the current state calculated by the target network. The gradients of the loss is calculated with respect to the parameters of the policy network, and then the loss is minimized using a gradient descent optimizer leading to parameter updates of the primary network.

The difference here in asynchronous one-step Q-learning is that there are multiple such learning agents, for instance, learners running and calculating this loss in parallel. Thus, the gradient calculation also occurs in parallel in different threads where each learning agent interacts with its own copy of the environment. The accumulation of these gradients in different threads over multiple time steps are used to update the policy network parameters after a fixed time step, or when an episode is over. The accumulation of gradients is preferred over policy network parameter updates because this avoids overwriting the changes perform by each of the learner agents.

Moreover, adding a different exploration policy to different threads makes the learning diverse and robust. This improves the performance owing to better exploration, because each of the learning agents in different threads is subjected to a different exploration policy. Though there are many ways to do this, a simple approach is to use different sample of epsilon for different threads while using -greedy .

The pseudo-code for asynchronous one-step Q-learning is shown as follows. Here, the following are the global parameters:

: the parameters (weights and biases) of the policy network
: parameters (weights and biases) of the target network
T: overall time step counter

// Globally shared parameters , and T
// is initialized arbitrarily
// T is initialized 0
pseudo-code for each learner running parallel in each of the threads:
Initialize thread level time step counter t=0
Initialize =
Initialize network gradients
Start with the initial state s
repeat until :
Choose action a with -greedy policy such that:

Perform action a
Receive new state s' and reward r
Compute target y :
Compute the loss,
Accumulate the gradient w.r.t. :
s = s'
T = T + 1
t = t + 1
if T mod :
Update the parameters of target network : =
# After every time steps the parameters of target network is updated
if t mod or s = terminal state:
Asynchronous update of using
Clear gradients :
#at every time step in the thread or if s is the terminal state
#update using accumulated gradients

Deep Reinforcement Learning in Ad Tech

So far in this unit of discussing reinforcement learning application research domains, we saw how reinforcement learning is disrupting the field of robotics, autonomous driving, financial portfolio management, and solving games of extremely high complexity, such as Go. Another important domain which is likely to be disrupted by reinforcement learning is advertisement technology.

Before getting into the details of the problem statement and it's solution based on reinforcement learning, let's understand the challenges, business models, and bidding strategies involved, which will work as a basic prerequisite in understanding the problem that we will try to solve using a reinforcement learning framework. The topics that we will be covering in this chapter are as follows:

Computational advertising challenges and bidding strategies
Real-time bidding by reinforcement learning in display advertising

Reinforcement learning in robotics

Robotics is associated with a high level of complexity in terms of behavior, which is difficult to hand engineer nor exhaustive enough to approach a task using supervised learning. Thus, reinforcement learning provides the kind of framework to capture such complex behavior.

Any task related to robotics is represented by high dimensional, continuous state, and action spaces. The environmental state is not fully observable. Learning in simulation alone is not enough to say the reinforcement learning agent is ready for the real world. In the case of robotics, a reinforcement learning agent should experience uncertainty in the real-world scenario but it's difficult and expensive to obtain and reproduce.

Robustness is the highest priority for robotics. In normal analytics or traditional machine learning problems, minor errors in data, pre-processing, or algorithms result in a significant change in behavior, especially for dynamic tasks. Thus, robust algorithms are required that can capture the real-world details. The next challenge for robot reinforcement learning is the reward function. Since the reward function plays the most important role in optimized learning, generating a domain specific reward function is needed that helps the learning agent to adapt better to the real world as quickly as possible. Thus, domain knowledge is the key behind devising a good reward function, which is again a hard task in robot machine learning.

Here, were will discuss the types of tasks in the field of robotics that can be achieved by the reinforcement learning algorithms we have studied in this book, and try connect them together to build a promising approach.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym»

Look at similar books to Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Sudharsan Ravichandiran

Deep Reinforcement Learning with Python, 2nd Edition

Lanham

Hands-On Deep Learning for Games

Nimish Sanghi

Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym

Praveen Palanisamy

TensorFlow 2 Reinforcement Learning Cookbook: Over 50 recipes to help you build, train, and deploy learning agents for real-world applications

Sewak

Deep Reinforcement Learning: Frontiers of Artificial Intelligence

Ravichandiran

Hands-On Reinforcement Learning with Python: Master Reinforcement and Deep Reinforcement Learning Using OpenAI Gym and TensorFlow

Beysolow II

Applied Reinforcement Learning with Python: With OpenAI Gym, Tensorflow, and Keras

Lanham

Hands-on deep learning for games: leverage the power of neural networks and reinforcement learning to build intelligent games

Balakrishnan

TensorFlow reinforcement learning quick start guide: get up and running with training and deploying intelligent, self-learning agents using Python

Sudharsan Ravichandiran

Hands-On Reinforcement Learning with Python: Master reinforcement and deep reinforcement learning using OpenAI Gym and TensorFlow

Abhishek Nandy

Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python

Sean Saito

Python Reinforcement Learning Projects

Reviews about «Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym»

Discussion, reviews of the book Reinforcement Learning With TensorFlow: A Beginner’s Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.