ROBUST ADAPTIVE
DYNAMIC PROGRAMMING
Yu Jiang
The MathWorks, Inc.
Zhong-Ping Jiang
New York University
A JOHN WILEY & SONS, INC., PUBLICATION
To my mother, Misi, and
Xiaofeng (YJ)
To my family (ZPJ)
PREFACE
This book covers the topic of adaptive optimal control (AOC) for continuous-time
systems. An adaptive optimal controller can gradually modify itself to adapt to the
controlled system, and the adaptation is measured by some performance index of
the closed-loop system. The study of AOC can be traced back to the 1970s, when
researches at the Los Alamos Scientific Laboratory (LASL) started to investigate the
use of adaptive and optimal control techniques in buildings with solar-based temper-
ature control. Compared with conventional adaptive control, AOC has the important
ability to improve energy conservation and system performance. However, even
though there are various ways in AOC to compute the optimal controller, most of the
previously known approaches are model based, in the sense that a model with a fixed
structure is assumed before designing the controller. In addition, these approaches
do not generalize to nonlinear models.
On the other hand, quite a few model-free, data driven approaches for AOC have
emerged in recent years. In particular, adaptive/approximate dynamic programming
(ADP) is a powerful methodology that integrates the idea of reinforcement learning
(RL) observed from mammalian brain with decision theory so that controllers for
man-made systems can learn to achieve optimal performance in spite of uncertainty
about the environment and the lack of detailed system models. Since the 1960s, R-
L has been brought to the computer science and control science literature as a way
to study artificial intelligence, and has been successfully applied to many discrete-
v
vi
PREFACE
time systems, or Markov Decision Processes (MDPs). However, it has always been
challenging to generalize those results to the controller design of physical system-
s. This is mainly because the state-space of a physical control system is generally
continuous and unbounded, and the states are continuous in time. Therefore, the
convergence and the stability properties have to be carefully studied for ADP-based
approaches. The main purpose of this book is to introduce the recently develope-
d framework, known as robust adaptive dynamic programming (RADP), for data-
driven, nonmodel-based adaptive optimal control design for both linear and nonlin-
ear continuous-time systems.
In addition, this book is intended to address in a systematic way the presence of
dynamic uncertainty. Dynamic uncertainty exists ubiquitously in control engineer-
ing. It is primarily caused by the dynamics which are part of the physical system
but either are difficult to be mathematically modeled, or are ignored for the sake of
controller design and system analysis. Without addressing the dynamic uncertainty,
controller designs based on the simplified model will most likely fail when applied
to the physical system. In most of the previously developed ADP or other RL meth-
ods, it is assumed that the full state information is always available, and therefore
the system order must be known. Although this assumption excludes the existence
of any dynamic uncertainty, it is apparently too strong to be realistic. For physical
model with a relatively large-scale, knowing the exact number of state variables can
be difficult, not to mention not all state variables can be measured precisely. For
example, consider a power-grid with both a main generator controlled by the utility
company and small distributed generators (DGs) installed by customers. The utility
company should not neglect the dynamics of the DGs but should treat them as dy-
namic uncertainties when controlling the grid, such that stability, performance, and
power security can be always maintained as expected.
The book is organized in four parts. First, an overview of RL, ADP, and RADP
is contained in Chapter 1. Second, a few recently developed continuous-time ADP
methods are introduced in Chapters 2, 3, and 4. Chapter 2 covers the topic of ADP
for uncertain linear systems. Chapters 3 and 4 provide neural-network-based and
sum-of-squares (SOS)-based ADP methodologies to achieve semi-global and global
stabilization for uncertain nonlinear continuous-time systems, respectively. Third,
Chapters 5 and 6 focus on RADP for linear and nonlinear systems, with dynamic
uncertainties rigorously addressed. In Chapter 5, different robustification schemes
are introduced to achieve RADP. Chapter 6 further extends the RADP framework
for large-scale systems and illustrates its applicability to industrial power systems.
Finally, Chapter 7 applies ADP and RADP to study the sensorimotor control of hu-
mans, and the results suggest that humans may be using very similar approaches to
learn to coordinate movements to handle uncertainties in our daily lives.
This book makes a major departure from most existing texts covering the same
topics by providing many practical examples such as power systems and human sen-
sorimotor control systems to illustrate the effectiveness of our results. The book uses
MATLAB in each chapter to conduct numerical simulations. MATLAB is used as a
computational tool, a programming tool, and a graphical tool. Simulink, a graphi-
cal programming environment for modeling, simulating and analyzing multidomain
PREFACE
vii
dynamic systems, is used in Chapter 2. The third-party MATLAB-based software
SOSTOOLS and CVX are used in Chapters 4 and 5 to solve SOS programs and
semidefinite programs (SDP). All MATLAB programs and the Simulink model de-
veloped in this book as well as extension of these programs are available at http://yu-
jiang.github.io/radpbook/.
The development of this book would not have been possible without the support
and help of many people. The authors wish to thank Professor Frank Lewis and Dr.
Paul Werbos whose seminal work on adaptive/approximate dynamic programming
has laid down the foundation of the book. The first-named author (YJ) would like to
thank his Master’s Thesis adviser Prof. Jie Huang for guiding him into the area of
nonlinear control, and Dr. Yebin Wang for offering him a summer research intern-
ship position at Mitsubishi Electric Research Laboratories, where parts of the ideas
in Chapters 4 and 5 were originally inspired. The second-named author (ZPJ) would
like to acknowledge his colleagues - specially Drs. Alessandro Astolfi, Lei Guo, Iven
Mareels, and Frank Lewis – for many useful comments and constructive criticism on
some of the research summarized in the book. He is grateful to his students for the
boldness in entering the interesting yet still unpopular field of data-driven adaptive
optimal control. The authors wish to thank the editors and editorial staff, in partic-
ular, Mengchu Zhou, Mary Hatcher, Brady Chin, and Divya Narayanan, for their
efforts in publishing the book. We thank Tao Bian and Weinan Gao for collabora-
tion on generalizations and applications of ADP based on the framework of RADP
presented in this book. Finally, we thank our families for their sacrifice in adapting
to our hard-to-predict working schedules that often involve dynamic uncertainties.
From our family members, we have learned the importance of exploration noise in
achieving the desired trade-off between robustness and optimality. The bulk of this
research was accomplished while the first-named author was working towards his
PhD degree in the Control and Networks Lab at New York University Tandon School
of Engineering. The authors wish to acknowledge the research funding support by
the National Science Foundation.
Wellesley, Massachusetts
July, 2016
Brooklyn, New York
July, 2016
YU JIANG
ZHONG-PING JIANG