logo资料库

Morales M. Grokking Deep Reinforcement Learning (MEAP Version 11....pdf

第1页 / 共399页
第2页 / 共399页
第3页 / 共399页
第4页 / 共399页
第5页 / 共399页
第6页 / 共399页
第7页 / 共399页
第8页 / 共399页
资料共399页,剩余部分请下载后查看
Grokking Deep Reinforcement Learning MEAP V11
Copyright
Welcome
Brief contents
Chapter 1: Introduction to deep reinforcement learning
What is deep reinforcement learning?
Deep reinforcement learning is a machine learning approachto artificial intelligence
Deep reinforcement learning is concernedwith creating computer programs
Deep reinforcement learning agentscan solve problems that require intelligence
Deep reinforcement learning agentsimprove their behavior through trial-and-error learning
Deep reinforcement learning agentslearn from sequential feedback
Deep reinforcement learning agentslearn from evaluative feedback
Deep reinforcement learning agentslearn from sampled feedback
Deep reinforcement learning agentsutilize powerful non-linear function approximation
The past, present, and futureof deep reinforcement learning
Recent history of artificial intelligenceand deep reinforcement learning
Artificial intelligence winters
The current state of artificial intelligence
Progress in deep reinforcement learning
Opportunities ahead
The suitability of deep reinforcement learning
What are the pros and cons?
Deep reinforcement learning's strengths
Deep reinforcement learning's weaknesses
Setting clear two-way expectations
What to expect from the book?
How to get the most out of the book?
Deep reinforcement learning development environment
Summary
Chapter 2: Mathematical foundations of reinforcement learning
Components of reinforcement learning
Examples of problems, agents, and environments
The agent: The decision-maker
The environment: Everything else
Agent-environment interaction cycle
MDPs: The engine of the environment
States: Specific configurations of the environment
Action: A mechanism to influence the environment
Transition function: Consequences of agent actions
Reward signal: Carrots and sticks
Horizon: Time changes what's optimal
Discount: The future is uncertain, value it less
Extensions to MDPs
Putting it all together
Summary
Chapter 3: Balancing immediate and long-term goals
The objective of a decision-making agent
Policies: Per-state action prescriptions
State-value function: What to expect from here?
Action-value function: What to expect from here if I do this?
Action-advantage function: How much better if I do that?
Optimality
Planning optimal sequences of actions
Policy Evaluation: Rating policies
Policy Improvement: Using ratings to get better
Policy Iteration: Improving upon improved behaviors
Value Iteration: Improving behaviors early
Summary
Chapter 4: Balancing the gathering and utilization of information
The challenge of interpreting evaluative feedback
Bandits: Single state decision problems
Regret: The cost of exploration
Approaches to solving MAB environments
Greedy: Always exploit
Random: Always explore
Epsilon-Greedy: Almost always greedy and sometimes random
Decaying Epsilon-Greedy: First maximize exploration, then exploitation
Optimistic Initialization: Start off believing it's a wonderful world
Strategic exploration
SoftMax: Select actions randomly in proportion to their estimates
UCB: It's not about just optimism; it's about realistic optimism
Thompson Sampling: Balancing reward and risk
Summary
Chapter 5: Evaluating agents' behaviors
Learning to estimate the value of policies
First-visit Monte-Carlo: Improving estimates after each episode
Every-visit Monte-Carlo: A different way of handling state visits
Temporal-Difference Learning: Improving estimates after each step
Learning to estimate from multiple steps
N-step TD Learning: Improving estimates after a couple of steps
Forward-view TD(λ): Improving estimates of all visited states
TD(λ): Improving estimates of all visited states after each step
Summary
Chapter 6: Improving agents' behaviors
The anatomy of reinforcement learning agents
Most agents gather experience samples
Most agents estimate something
Most agents improve a policy
Generalized Policy Iteration
Learning to improve policies of behavior
Monte-Carlo Control: Improving policies after each episode
Sarsa: Improving policies after each step
Decoupling behavior from learning
Q-Learning: Learning to act optimally, even if we choose not to
Double Q-Learning: a max of estimates for an estimate of a max
Summary
Chapter 7: Achieving goals more effectively and efficiently
Learning to improve policies using robust targets
Sarsa(λ): Improving policies after each step based on multi-step estimates
Watkins's Q(λ): Decoupling behavior from learning, again
Agents that interact, learn and plan
Dyna-Q: Learning sample models
Trajectory Sampling: Making plans for the immediate future
Summary
Chapter 8: Introduction to value-based deep reinforcement learning
The kind of feedback deep reinforcement learning agents use
Deep reinforcement learning agents deal with sequential feedback
But, if it is not sequential, what is it?
Deep reinforcement learning agents deal with evaluative feedback
But, if it is not evaluative, what is it?
Deep reinforcement learning agents deal with sampled feedback
But, if it is not sampled, what is it?
Introduction to function approximation for reinforcement learning
Reinforcement learning problems can have high-dimensional state and action spaces
Reinforcement learning problem scan have continuous state and action spaces
There are advantages when using function approximation
NFQ: The first attempt to value-based deep reinforcement learning
First decision point: Selecting a value function to approximate
Second decision point: Selecting a neural network architecture
Third decision point: Selecting what to optimize
Fourth decision point: Selecting the targets for policy evaluation
Fifth decision point: Selecting an exploration strategy
Sixth decision point: Selecting a loss function
Seventh decision point: Selecting an optimization method
Things that could (and do) go wrong
Summary
Chapter 9: More stable value-based methods
DQN: Making reinforcement learning more like supervised learning
Common problems in value-based deep reinforcement learning
Using target networks
Using larger networks
Using experience replay
Using other exploration strategies
Double DQN: Mitigating the overestimation of action-value functions
The problem of overestimation, take two
Separating action selection and action evaluation
A solution
A more practical solution
A more forgiving loss function
Things we can still improve on
Summary
Chapter 10: Sample-efficient value-based methods
Dueling DDQN: A reinforcement-learning-aware neural network architecture
Reinforcement learning is not a supervised learning problem
Nuances of value-based deep reinforcement learning methods
Advantage of using advantages
A reinforcement-learning-aware architecture
Building a dueling network
Reconstructing the action-value function
Continuously updating the target network
What does the dueling network bring to the table?
PER: Prioritizing the replay of meaningful experiences
A smarter way to replay experiences
Then, what is a good measure of "important" experiences?
Greedy prioritization by TD error
Sampling prioritized experiences stochastically
Proportional prioritization
Rank-based prioritization
Prioritization bias
Summary
Chapter 11: Policy-gradient and actor-critic methods
REINFORCE: Outcome-based policy learning
Introduction to policy-gradient methods
Advantages of policy-gradient methods
Learning policies directly
Reducing the variance of the policy gradient
VPG: Learning a value function
Further reducing the variance of the policy gradient
Learning a value function
Encouraging exploration
A3C: Parallel policy updates
Using actor-workers
Using n-step estimates
Non-blocking model updates
GAE: Robust advantage estimation
Generalized advantage estimation
A2C: Synchronous policy updates
Generalized advantage estimation
A2C: Synchronous policy updates
Generalized advantage estimation
A2C: Synchronous policy updates
Weight-sharing model
Restoring order in policy updates
Summary
Chapter 12: Advanced actor-critic methods
DDPG: Approximating a deterministic policy
DDPG uses lots of tricks from DQN
Learning a deterministic policy
Exploration with deterministic policies
TD3: State-of-the-art improvements over DDPG
DDPG uses lots of tricks from DQN
Learning a deterministic policy
Exploration with deterministic policies
TD3: State-of-the-art improvements over DDPG
Double learning in DDPG
Smoothing the targets used for policy updates
Delaying updates
SAC: Maximizing the expected return and entropy
Adding the entropy to the Bellman equations
Learning the action-value function
Learning the policy
Automatically tuning the entropy coefficient
PPO: Restricting optimization steps
Using the same actor-critic architecture as A2C
Batching experiences
Clipping the policy updates
Clipping the value function updates
Summary
MEAP Edition Manning Early Access Program Grokking Deep Reinforcement Learning Version 11 Copyright 2020 Manning Publications For more information on this and other Manning titles go to manning.com
welcome Thanks for purchasing the MEAP for Grokking Deep Reinforcement Learning. My vision is that by buying this book, you will not only learn deep reinforcement learning but also become an active contributor to the field. Deep reinforcement learning has the potential to revolutionize the world as we know it. By removing humans from decision-making processes, we set ourselves up to succeed. Humans can't match the stamina and work ethic of a computer; we also have biases that make us less than perfect. Imagine how many decision-making applications could be improved with the objectivity and optimal decision making of a machine—healthcare, education, finance, defense, robotics, etc. Think of any process in which a human repeatedly makes decisions; deep reinforcement learning can help in most of them. Deep reinforcement learning can do great things as it is today, but the field is still not perfect. That should excite you, because it means we need people with the interest and skills to push the boundaries of this field forward. We are lucky to be part of this world at this point, and we should take advantage of it and make history. Are you up for the challenge? I've been involved in Reinforcement Learning for a few years now. I first studied the topic in a course at Georgia Tech: Reinforcement Learning and Decision Making, which was co- taught by Drs. Charles Isbell and Michael Littman. It was inspiring to hear from top researchers in the field, interact with them daily, and listen to their perspectives. The following semester, I became a Teaching Assistant for the course and never looked back. Today, I'm an Instructional Associate at Georgia Tech and continue to help with the class daily. I've been privileged to interact with top researchers in the field and with hundreds of students, and I've become a bridge between the experts and the students for almost two years now. I understand the gaps in knowledge, the topics that are often the source of confusion, the students' interests, the foundational knowledge that is classic yet necessary, the classical papers that can be skipped, and many other things that put me in a position to write this book. In addition to teaching at Georgia Tech, I work full-time for Lockheed Martin, Missile and Fire Control - Autonomous Systems. We do top autonomy work, part of which involves the use of autonomous decision-making such as in deep reinforcement learning. I felt inspired to take my passion for both teaching and deep reinforcement learning to the next level by making this field available to anyone who is willing to put in the work. I partnered with Manning to deliver a great book to you. Our goal is to help readers understand how deep learning makes reinforcement learning a more effective approach. In the first part of the book, we will dive into the foundational knowledge specific to reinforcement learning. Here you'll gain the necessary expertise to solve more complex decision-making problems. In the second part, I'll teach you to use deep learning techniques to solve massive, complex reinforcement learning problems. We will dive into the top deep reinforcement learning algorithms and dissect them one at a time. Finally, in
the third part, we will look at advanced applications of these techniques. We will put everything together then and help you see the potential of this technology. Again, it is an honor to have you with me; I hope that I can inspire you to give your best and apply the knowledge you will obtain in this book to solve complex decision-making problems and make this a better place. Humans may be sub-optimal decision makers, but buying this book was without a doubt the right thing to do. Let's get working. —Miguel Morales
brief contents 1 Introduction to deep reinforcement learning 2 Mathematical foundations of reinforcement learning 3 Balancing immediate and long-term goals 4 Balancing the gathering and utilization of information 5 Estimating the value of agents' behaviors 6 Improving agents’ behaviors 7 Achieving goals more effectively and efficiently 8 Introduction to value-based deep reinforcement learning 9 More stable value-based methods 10 Sample-efficient value-based methods 11 Policy-gradient and actor-critic methods 12 Advanced actor-critic methods 13 Towards artificial general intelligence
deep reinforcement learning 1 introduction to In this chapter • • • You learn what deep reinforcement learning is and how it is different from other machine learning approaches. You learn about the recent progress in deep reinforcement learning and what it can do for a variety of problems. You know what to expect from this book, and how to get the most out of it. I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines. — Claude Shannon Father of the Information Age and contributor to the field of Artificial Intelligence ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning1
2 Chapter 1 I introduction to deep reinforcement learning Humans naturally pursue feelings of happiness. From picking out our meals to advancing our careers, every action we choose is derived from our drive to experience rewarding moments in life. Whether these moments are self-centered pleasures or the more generous of goals, whether they bring us immediate gratification or long-term success, they are still our perception of how important and valuable they are. And to some extent, these moments are the reason for our existence. Our ability to achieve these precious moments seems to be correlated with intelligence; "Intelligence" is defined as the ability to acquire and apply knowledge and skills. People that are deemed by society as intelligent are not only capable of trading-off immediate satisfaction for long-term goals, but also a good, certain future for a possibly better, yet uncertain one. Goals that take longer to materialize and that have unknown long-term value are usually the hardest to achieve, and it is those who can withstand the challenges along the way that are the exception, the leaders, the intellectuals of society. In this book, you learn about an approach, known as deep reinforcement learning, involved with creating computer programs that can achieve goals that require intelligence. In this chapter, you are introduced to deep reinforcement learning and learn how to get the most out of this book. What is deep reinforcement learning? Deep reinforcement learning (DRL) is a machine learning approach to artificial intelligence concerned with creating computer programs that can solve problems requiring intelligence. The distinct property of DRL programs is the learning through trial and error from feedback that is simultaneously sequential, evaluative, and sampled by leveraging powerful non-linear function approximation. I want to unpack this definition for you one bit at a time. But, don't get too caught up with the details as it'll take me the whole book to get you grokking deep reinforcement learning. The following is just the introduction of what you learn about in this book. As such, it's repeated and explained in detail in the chapters ahead. If I succeed with my goal for this book, after you complete it, you should be able to come back to this definition and understand it precisely. You should be able to tell why I used the words that I used, why I didn't use more or fewer words. But, for this chapter, simply sit back and plow through it. ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning2
What is deep reinforcement learning? 3 Deep reinforcement learning is a machine learning approach to artificial intelligence Artificial intelligence (AI) is a branch of computer science involved in the creation of computer programs capable of demonstrating intelligence. Traditionally, any piece of software that displays cognitive abilities such as perception, search, planning, and learning is considered part of AI. Some examples of functionality produced by AI software are: • The pages returned by a search engine. • The route produced by a GPS app. • The voice recognition and the synthetic voice of a smart-assistant software. • The recommended products shown on e-commerce sites. • The follow-me feature in drones. Subfields of Artificial Intelligence (1) Some of the most important areas of study under the field of Artificial Intelligence. Artificial Intelligence Perception Machine Learning Expert Systems Planning Natural Language Processing Computer Vision Robotics Search Logic All computer programs that display intelligence are considered AI, but not all examples of AI can learn. Machine learning (ML) is the area of AI concerned with creating computer programs that can solve problems requiring intelligence by learning from data. There are three main branches of ML: supervised, unsupervised, and reinforcement learning. ©Manning Publications Co. To comment go to liveBook https://forums.manning.com/forums/grokking-deep-reinforcement-learning3
分享到:
收藏