Deep Reinforcement Learning Hands-On Apply modern RL methods,.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：12.81M 资料格式：pdf 举报版权申诉

weixin_39397839-10969834-4744300845371754144.pdf-第1页.png

第1页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第2页.png

第2页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第3页.png

第3页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第4页.png

第4页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第5页.png

第5页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第6页.png

第6页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第7页.png

第7页 / 共752页

weixin_39397839-10969834-4744300845371754144.pdf-第8页.png

第8页 / 共752页

Deep Reinforcement Learning Hands-On

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

1. What is Reinforcement Learning?

Learning – supervised, unsupervised, and reinforcement

RL formalisms and relations

Reward

The agent

The environment

Actions

Observations

Markov decision processes

Markov process

Markov reward process

Markov decision process

Summary

2. OpenAI Gym

The anatomy of the agent

Hardware and software requirements

OpenAI Gym API

Action space

Observation space

The environment

Creation of the environment

The CartPole session

The random CartPole agent

The extra Gym functionality – wrappers and monitors

Wrappers

Monitor

Summary

3. Deep Learning with PyTorch

Tensors

Creation of tensors

Scalar tensors

Tensor operations

GPU tensors

Gradients

Tensors and gradients

NN building blocks

Custom layers

Final glue – loss functions and optimizers

Loss functions

Optimizers

Monitoring with TensorBoard

TensorBoard 101

Plotting stuff

Example – GAN on Atari images

Summary

4. The Cross-Entropy Method

Taxonomy of RL methods

Practical cross-entropy

Cross-entropy on CartPole

Cross-entropy on FrozenLake

Theoretical background of the cross-entropy method

Summary

5. Tabular Learning and the Bellman Equation

Value, state, and optimality

The Bellman equation of optimality

Value of action

The value iteration method

Value iteration in practice

Q-learning for FrozenLake

Summary

6. Deep Q-Networks

Real-life value iteration

Tabular Q-learning

Deep Q-learning

Interaction with the environment

SGD optimization

Correlation between steps

The Markov property

The final form of DQN training

DQN on Pong

Wrappers

DQN model

Training

Running and performance

Your model in action

Summary

7. DQN Extensions

The PyTorch Agent Net library

Agent

Agent's experience

Experience buffer

Gym env wrappers

Basic DQN

N-step DQN

Implementation

Double DQN

Implementation

Results

Noisy networks

Implementation

Results

Prioritized replay buffer

Implementation

Results

Dueling DQN

Implementation

Results

Categorical DQN

Implementation

Results

Combining everything

Implementation

Results

Summary

References

8. Stocks Trading Using RL

Trading

Data

Problem statements and key decisions

The trading environment

Models

Training code

Results

The feed-forward model

The convolution model

Things to try

Summary

9. Policy Gradients – An Alternative

Values and policy

Why policy?

Policy representation

Policy gradients

The REINFORCE method

The CartPole example

Results

Policy-based versus value-based methods

REINFORCE issues

Full episodes are required

High gradients variance

Exploration

Correlation between samples

PG on CartPole

Results

PG on Pong

Results

Summary

10. The Actor-Critic Method

Variance reduction

CartPole variance

Actor-critic

A2C on Pong

A2C on Pong results

Tuning hyperparameters

Learning rate

Entropy beta

Count of environments

Batch size

Summary

11. Asynchronous Advantage Actor-Critic

Correlation and sample efficiency

Adding an extra A to A2C

Multiprocessing in Python

A3C – data parallelism

Results

A3C – gradients parallelism

Results

Summary

12. Chatbots Training with RL

Chatbots overview

Deep NLP basics

Recurrent Neural Networks

Embeddings

Encoder-Decoder

Training of seq2seq

Log-likelihood training

Bilingual evaluation understudy (BLEU) score

RL in seq2seq

Self-critical sequence training

The chatbot example

The example structure

Modules: cornell.py and data.py

BLEU score and utils.py

Model

Training: cross-entropy

Running the training

Checking the data

Testing the trained model

Training: SCST

Running the SCST training

Results

Telegram bot

Summary

13. Web Navigation

Web navigation

Browser automation and RL

Mini World of Bits benchmark

OpenAI Universe

Installation

Actions and observations

Environment creation

MiniWoB stability

Simple clicking approach

Grid actions

Example overview

Model

Training code

Starting containers

Training process

Checking the learned policy

Issues with simple clicking

Human demonstrations

Recording the demonstrations

Recording format

Training using demonstrations

Results

TicTacToe problem

Adding text description

Results

Things to try

Summary

14. Continuous Action Space

Why a continuous space?

Action space

Environments

The Actor-Critic (A2C) method

Implementation

Results

Using models and recording videos

Deterministic policy gradients

Exploration

Implementation

Results

Recording videos

Distributional policy gradients

Architecture

Implementation

Results

Things to try

Summary

15. Trust Regions – TRPO, PPO, and ACKTR

Introduction

Roboschool

A2C baseline

Results

Videos recording

Proximal Policy Optimization

Implementation

Results

Trust Region Policy Optimization

Implementation

Results

A2C using ACKTR

Implementation

Results

Summary

16. Black-Box Optimization in RL

Black-box methods

Evolution strategies

ES on CartPole

Results

ES on HalfCheetah

Results

Genetic algorithms

GA on CartPole

Results

GA tweaks

Deep GA

Novelty search

GA on Cheetah

Results

Summary

References

17. Beyond Model-Free – Imagination

Model-based versus model-free

Model imperfections

Imagination-augmented agent

The environment model

The rollout policy

The rollout encoder

Paper results

I2A on Atari Breakout

The baseline A2C agent

EM training

The imagination agent

The I2A model

The Rollout encoder

Training of I2A

Experiment results

The baseline agent

Training EM weights

Training with the I2A model

Summary

References

18. AlphaGo Zero

Board games

The AlphaGo Zero method

Overview

Monte-Carlo Tree Search

Self-play

Training and evaluation

Connect4 bot

Game model

Implementing MCTS

Model

Training

Testing and comparison

Connect4 results

Summary

References

Book summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Deep Reinforcement Learning Hands-On

Table of Contents Deep Reinforcement Learning Hands-On Why subscribe? PacktPub.com Contributors About the author About the reviewers Packt is Searching for Authors Like You Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews 1. What is Reinforcement Learning? Learning – supervised, unsupervised, and reinforcement RL formalisms and relations Reward The agent The environment Actions Observations Markov decision processes Markov process Markov reward process Markov decision process Summary 2. OpenAI Gym The anatomy of the agent Hardware and software requirements OpenAI Gym API

Action space Observation space The environment Creation of the environment The CartPole session The random CartPole agent The extra Gym functionality – wrappers and monitors 3. Deep Learning with PyTorch Wrappers Monitor Summary Tensors Creation of tensors Scalar tensors Tensor operations GPU tensors Gradients Tensors and gradients NN building blocks Custom layers Final glue – loss functions and optimizers Monitoring with TensorBoard Loss functions Optimizers TensorBoard 101 Plotting stuff Example – GAN on Atari images Summary 4. The Cross-Entropy Method Taxonomy of RL methods Practical cross-entropy Cross-entropy on CartPole Cross-entropy on FrozenLake Theoretical background of the cross-entropy method Summary 5. Tabular Learning and the Bellman Equation Value, state, and optimality

Interaction with the environment SGD optimization Correlation between steps The Markov property The final form of DQN training DQN on Pong Wrappers DQN model Training Running and performance Your model in action Summary 7. DQN Extensions The PyTorch Agent Net library The Bellman equation of optimality Value of action The value iteration method Value iteration in practice Q-learning for FrozenLake Summary 6. Deep Q-Networks Real-life value iteration Tabular Q-learning Deep Q-learning Agent Agent's experience Experience buffer Gym env wrappers Basic DQN N-step DQN Implementation Double DQN Implementation Results Noisy networks Implementation Results

Prioritized replay buffer Implementation Results Dueling DQN Implementation Results Categorical DQN Implementation Results Combining everything Implementation Results Summary References 8. Stocks Trading Using RL Trading Data Problem statements and key decisions The trading environment Models Training code Results The feed-forward model The convolution model 9. Policy Gradients – An Alternative Things to try Summary Values and policy Why policy? Policy representation Policy gradients The REINFORCE method The CartPole example Results Policy-based versus value-based methods REINFORCE issues Full episodes are required

High gradients variance Exploration Correlation between samples PG on CartPole Results PG on Pong Results Summary 10. The Actor-Critic Method Variance reduction CartPole variance Actor-critic A2C on Pong A2C on Pong results Tuning hyperparameters Learning rate Entropy beta Count of environments Batch size Summary 11. Asynchronous Advantage Actor-Critic Correlation and sample efficiency Adding an extra A to A2C Multiprocessing in Python A3C – data parallelism A3C – gradients parallelism Results Results Summary 12. Chatbots Training with RL Chatbots overview Deep NLP basics Recurrent Neural Networks Embeddings Encoder-Decoder Training of seq2seq Log-likelihood training

Bilingual evaluation understudy (BLEU) score RL in seq2seq Self-critical sequence training The chatbot example The example structure Modules: cornell.py and data.py BLEU score and utils.py Model Training: cross-entropy Running the training Checking the data Testing the trained model Training: SCST Running the SCST training Results Telegram bot Summary 13. Web Navigation Web navigation Browser automation and RL Mini World of Bits benchmark OpenAI Universe Installation Actions and observations Environment creation MiniWoB stability Simple clicking approach Grid actions Example overview Model Training code Starting containers Training process Checking the learned policy Issues with simple clicking Human demonstrations Recording the demonstrations

分享到：

赞收藏

资料库

Deep Reinforcement Learning Hands-On Apply modern RL methods,.pdf

相关推荐

人工智能

热门标签

最新资料