Morales M. Grokking Deep Reinforcement Learning (MEAP Version 11....pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：32.12M 资料格式：pdf 举报版权申诉

tracylhp-12527444-4744300845403214276.pdf-第1页.png

第1页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第2页.png

第2页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第3页.png

第3页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第4页.png

第4页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第5页.png

第5页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第6页.png

第6页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第7页.png

第7页 / 共399页

tracylhp-12527444-4744300845403214276.pdf-第8页.png

第8页 / 共399页

Grokking Deep Reinforcement Learning MEAP V11

Welcome

Brief contents

Chapter 1: Introduction to deep reinforcement learning

What is deep reinforcement learning?

Deep reinforcement learning is a machine learning approachto artificial intelligence

Deep reinforcement learning is concernedwith creating computer programs

Deep reinforcement learning agentscan solve problems that require intelligence

Deep reinforcement learning agentsimprove their behavior through trial-and-error learning

Deep reinforcement learning agentslearn from sequential feedback

Deep reinforcement learning agentslearn from evaluative feedback

Deep reinforcement learning agentslearn from sampled feedback

Deep reinforcement learning agentsutilize powerful non-linear function approximation

The past, present, and futureof deep reinforcement learning

Recent history of artificial intelligenceand deep reinforcement learning

Artificial intelligence winters

The current state of artificial intelligence

Progress in deep reinforcement learning

Opportunities ahead

The suitability of deep reinforcement learning

What are the pros and cons?

Deep reinforcement learning's strengths

Deep reinforcement learning's weaknesses

Setting clear two-way expectations

What to expect from the book?

How to get the most out of the book?

Deep reinforcement learning development environment

Summary

Chapter 2: Mathematical foundations of reinforcement learning

Components of reinforcement learning

Examples of problems, agents, and environments

The agent: The decision-maker

The environment: Everything else

Agent-environment interaction cycle

MDPs: The engine of the environment

States: Specific configurations of the environment

Action: A mechanism to influence the environment

Transition function: Consequences of agent actions

Reward signal: Carrots and sticks

Horizon: Time changes what's optimal

Discount: The future is uncertain, value it less

Extensions to MDPs

Putting it all together

Summary

Chapter 3: Balancing immediate and long-term goals

The objective of a decision-making agent

Policies: Per-state action prescriptions

State-value function: What to expect from here?

Action-value function: What to expect from here if I do this?

Action-advantage function: How much better if I do that?

Optimality

Planning optimal sequences of actions

Policy Evaluation: Rating policies

Policy Improvement: Using ratings to get better

Policy Iteration: Improving upon improved behaviors

Value Iteration: Improving behaviors early

Summary

Chapter 4: Balancing the gathering and utilization of information

The challenge of interpreting evaluative feedback

Bandits: Single state decision problems

Regret: The cost of exploration

Approaches to solving MAB environments

Greedy: Always exploit

Random: Always explore

Epsilon-Greedy: Almost always greedy and sometimes random

Decaying Epsilon-Greedy: First maximize exploration, then exploitation

Optimistic Initialization: Start off believing it's a wonderful world

Strategic exploration

SoftMax: Select actions randomly in proportion to their estimates

UCB: It's not about just optimism; it's about realistic optimism

Thompson Sampling: Balancing reward and risk

Summary

Chapter 5: Evaluating agents' behaviors

Learning to estimate the value of policies

First-visit Monte-Carlo: Improving estimates after each episode

Every-visit Monte-Carlo: A different way of handling state visits

Temporal-Difference Learning: Improving estimates after each step

Learning to estimate from multiple steps

N-step TD Learning: Improving estimates after a couple of steps

Forward-view TD(λ): Improving estimates of all visited states

TD(λ): Improving estimates of all visited states after each step

Summary

Chapter 6: Improving agents' behaviors

The anatomy of reinforcement learning agents

Most agents gather experience samples

Most agents estimate something

Most agents improve a policy

Generalized Policy Iteration

Learning to improve policies of behavior

Monte-Carlo Control: Improving policies after each episode

Sarsa: Improving policies after each step

Decoupling behavior from learning

Q-Learning: Learning to act optimally, even if we choose not to

Double Q-Learning: a max of estimates for an estimate of a max

Summary

Chapter 7: Achieving goals more effectively and efficiently

Learning to improve policies using robust targets

Sarsa(λ): Improving policies after each step based on multi-step estimates

Watkins's Q(λ): Decoupling behavior from learning, again

Agents that interact, learn and plan

Dyna-Q: Learning sample models

Trajectory Sampling: Making plans for the immediate future

Summary

Chapter 8: Introduction to value-based deep reinforcement learning

The kind of feedback deep reinforcement learning agents use

Deep reinforcement learning agents deal with sequential feedback

But, if it is not sequential, what is it?

Deep reinforcement learning agents deal with evaluative feedback

But, if it is not evaluative, what is it?

Deep reinforcement learning agents deal with sampled feedback

But, if it is not sampled, what is it?

Introduction to function approximation for reinforcement learning

Reinforcement learning problems can have high-dimensional state and action spaces

Reinforcement learning problem scan have continuous state and action spaces

There are advantages when using function approximation

NFQ: The first attempt to value-based deep reinforcement learning

First decision point: Selecting a value function to approximate

Second decision point: Selecting a neural network architecture

Third decision point: Selecting what to optimize

Fourth decision point: Selecting the targets for policy evaluation

Fifth decision point: Selecting an exploration strategy

Sixth decision point: Selecting a loss function

Seventh decision point: Selecting an optimization method

Things that could (and do) go wrong

Summary

Chapter 9: More stable value-based methods

DQN: Making reinforcement learning more like supervised learning

Common problems in value-based deep reinforcement learning

Using target networks

Using larger networks

Using experience replay

Using other exploration strategies

Double DQN: Mitigating the overestimation of action-value functions

The problem of overestimation, take two

Separating action selection and action evaluation

A solution

A more practical solution

A more forgiving loss function

Things we can still improve on

Summary

Chapter 10: Sample-efficient value-based methods

Dueling DDQN: A reinforcement-learning-aware neural network architecture

Reinforcement learning is not a supervised learning problem

Nuances of value-based deep reinforcement learning methods

Advantage of using advantages

A reinforcement-learning-aware architecture

Building a dueling network

Reconstructing the action-value function

Continuously updating the target network

What does the dueling network bring to the table?

PER: Prioritizing the replay of meaningful experiences

A smarter way to replay experiences

Then, what is a good measure of "important" experiences?

Greedy prioritization by TD error

Sampling prioritized experiences stochastically

Proportional prioritization

Rank-based prioritization

Prioritization bias

Summary

Chapter 11: Policy-gradient and actor-critic methods

REINFORCE: Outcome-based policy learning

Introduction to policy-gradient methods

Advantages of policy-gradient methods

Learning policies directly

Reducing the variance of the policy gradient

VPG: Learning a value function

Further reducing the variance of the policy gradient

Learning a value function

Encouraging exploration

A3C: Parallel policy updates

Using actor-workers

Using n-step estimates

Non-blocking model updates

GAE: Robust advantage estimation

Generalized advantage estimation

A2C: Synchronous policy updates

Generalized advantage estimation

A2C: Synchronous policy updates

Generalized advantage estimation

A2C: Synchronous policy updates

Weight-sharing model

Restoring order in policy updates

Summary

Chapter 12: Advanced actor-critic methods

DDPG: Approximating a deterministic policy

DDPG uses lots of tricks from DQN

Learning a deterministic policy

Exploration with deterministic policies

TD3: State-of-the-art improvements over DDPG

DDPG uses lots of tricks from DQN

Learning a deterministic policy

Exploration with deterministic policies

TD3: State-of-the-art improvements over DDPG

Double learning in DDPG

Smoothing the targets used for policy updates

Delaying updates

SAC: Maximizing the expected return and entropy

Adding the entropy to the Bellman equations

Learning the action-value function

Learning the policy

Automatically tuning the entropy coefficient

PPO: Restricting optimization steps

Using the same actor-critic architecture as A2C

Batching experiences

Clipping the policy updates

Clipping the value function updates

Summary

welcome Thanks for purchasing the MEAP for Grokking Deep Reinforcement Learning. My vision is that by buying this book, you will not only learn deep reinforcement learning but also become an active contributor to the field. Deep reinforcement learning has the potential to revolutionize the world as we know it. By removing humans from decision-making processes, we set ourselves up to succeed. Humans can't match the stamina and work ethic of a computer; we also have biases that make us less than perfect. Imagine how many decision-making applications could be improved with the objectivity and optimal decision making of a machine—healthcare, education, finance, defense, robotics, etc. Think of any process in which a human repeatedly makes decisions; deep reinforcement learning can help in most of them. Deep reinforcement learning can do great things as it is today, but the field is still not perfect. That should excite you, because it means we need people with the interest and skills to push the boundaries of this field forward. We are lucky to be part of this world at this point, and we should take advantage of it and make history. Are you up for the challenge? I've been involved in Reinforcement Learning for a few years now. I first studied the topic in a course at Georgia Tech: Reinforcement Learning and Decision Making, which was co- taught by Drs. Charles Isbell and Michael Littman. It was inspiring to hear from top researchers in the field, interact with them daily, and listen to their perspectives. The following semester, I became a Teaching Assistant for the course and never looked back. Today, I'm an Instructional Associate at Georgia Tech and continue to help with the class daily. I've been privileged to interact with top researchers in the field and with hundreds of students, and I've become a bridge between the experts and the students for almost two years now. I understand the gaps in knowledge, the topics that are often the source of confusion, the students' interests, the foundational knowledge that is classic yet necessary, the classical papers that can be skipped, and many other things that put me in a position to write this book. In addition to teaching at Georgia Tech, I work full-time for Lockheed Martin, Missile and Fire Control - Autonomous Systems. We do top autonomy work, part of which involves the use of autonomous decision-making such as in deep reinforcement learning. I felt inspired to take my passion for both teaching and deep reinforcement learning to the next level by making this field available to anyone who is willing to put in the work. I partnered with Manning to deliver a great book to you. Our goal is to help readers understand how deep learning makes reinforcement learning a more effective approach. In the first part of the book, we will dive into the foundational knowledge specific to reinforcement learning. Here you'll gain the necessary expertise to solve more complex decision-making problems. In the second part, I'll teach you to use deep learning techniques to solve massive, complex reinforcement learning problems. We will dive into the top deep reinforcement learning algorithms and dissect them one at a time. Finally, in

the third part, we will look at advanced applications of these techniques. We will put everything together then and help you see the potential of this technology. Again, it is an honor to have you with me; I hope that I can inspire you to give your best and apply the knowledge you will obtain in this book to solve complex decision-making problems and make this a better place. Humans may be sub-optimal decision makers, but buying this book was without a doubt the right thing to do. Let's get working. —Miguel Morales

brief contents 1 Introduction to deep reinforcement learning 2 Mathematical foundations of reinforcement learning 3 Balancing immediate and long-term goals 4 Balancing the gathering and utilization of information 5 Estimating the value of agents' behaviors 6 Improving agents’ behaviors 7 Achieving goals more effectively and efficiently 8 Introduction to value-based deep reinforcement learning 9 More stable value-based methods 10 Sample-efficient value-based methods 11 Policy-gradient and actor-critic methods 12 Advanced actor-critic methods 13 Towards artificial general intelligence

deep reinforcement learning 1 introduction to In this chapter • • • You learn what deep reinforcement learning is and how it is different from other machine learning approaches. You learn about the recent progress in deep reinforcement learning and what it can do for a variety of problems. You know what to expect from this book, and how to get the most out of it. I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines. — Claude Shannon Father of the Information Age and contributor to the field of Artificial Intelligence ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning1

2 Chapter 1 I introduction to deep reinforcement learning Humans naturally pursue feelings of happiness. From picking out our meals to advancing our careers, every action we choose is derived from our drive to experience rewarding moments in life. Whether these moments are self-centered pleasures or the more generous of goals, whether they bring us immediate gratification or long-term success, they are still our perception of how important and valuable they are. And to some extent, these moments are the reason for our existence. Our ability to achieve these precious moments seems to be correlated with intelligence; "Intelligence" is defined as the ability to acquire and apply knowledge and skills. People that are deemed by society as intelligent are not only capable of trading-off immediate satisfaction for long-term goals, but also a good, certain future for a possibly better, yet uncertain one. Goals that take longer to materialize and that have unknown long-term value are usually the hardest to achieve, and it is those who can withstand the challenges along the way that are the exception, the leaders, the intellectuals of society. In this book, you learn about an approach, known as deep reinforcement learning, involved with creating computer programs that can achieve goals that require intelligence. In this chapter, you are introduced to deep reinforcement learning and learn how to get the most out of this book. What is deep reinforcement learning? Deep reinforcement learning (DRL) is a machine learning approach to artificial intelligence concerned with creating computer programs that can solve problems requiring intelligence. The distinct property of DRL programs is the learning through trial and error from feedback that is simultaneously sequential, evaluative, and sampled by leveraging powerful non-linear function approximation. I want to unpack this definition for you one bit at a time. But, don't get too caught up with the details as it'll take me the whole book to get you grokking deep reinforcement learning. The following is just the introduction of what you learn about in this book. As such, it's repeated and explained in detail in the chapters ahead. If I succeed with my goal for this book, after you complete it, you should be able to come back to this definition and understand it precisely. You should be able to tell why I used the words that I used, why I didn't use more or fewer words. But, for this chapter, simply sit back and plow through it. ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning2

What is deep reinforcement learning? 3 Deep reinforcement learning is a machine learning approach to artificial intelligence Artificial intelligence (AI) is a branch of computer science involved in the creation of computer programs capable of demonstrating intelligence. Traditionally, any piece of software that displays cognitive abilities such as perception, search, planning, and learning is considered part of AI. Some examples of functionality produced by AI software are: • The pages returned by a search engine. • The route produced by a GPS app. • The voice recognition and the synthetic voice of a smart-assistant software. • The recommended products shown on e-commerce sites. • The follow-me feature in drones. Subfields of Artificial Intelligence (1) Some of the most important areas of study under the field of Artificial Intelligence. Artificial Intelligence Perception Machine Learning Expert Systems Planning Natural Language Processing Computer Vision Robotics Search Logic All computer programs that display intelligence are considered AI, but not all examples of AI can learn. Machine learning (ML) is the area of AI concerned with creating computer programs that can solve problems requiring intelligence by learning from data. There are three main branches of ML: supervised, unsupervised, and reinforcement learning. ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning3

分享到：

赞收藏

资料库

Morales M. Grokking Deep Reinforcement Learning (MEAP Version 11....pdf

相关推荐

人工智能

热门标签

最新资料