logo资料库

An Introduction to Mathematical Optimal Control Theory-Notes v2.....pdf

第1页 / 共126页
第2页 / 共126页
第3页 / 共126页
第4页 / 共126页
第5页 / 共126页
第6页 / 共126页
第7页 / 共126页
第8页 / 共126页
资料共126页,剩余部分请下载后查看
An Introduction to Mathematical Optimal Control Theory-Lawrence C. Evans
1 Introduction
1.1 The basic problem
1.2 Some examples
1.3 A geometric solution
1.4 Overview
2 Controllability, bang-bang principle
2.1 Definitions
2.2 Quick review of linear ODE
2.3 Controllability of linear equations
2.4 Observability
2.5 Bang-bang principle
2.6 References
3 Linear time-optimal control
3.1 Existence of time-optimal controls
3.2 The Maximum Principle for linear time-optimal control
3.3 Examples
4 The Pontryagin Maximum Principle
4.1 Calculus of variations, Hamiltonian dynamics
4.2 Review of Lagrange multipliers
4.3 Statement of Pontryagin Maximum Principle
4.4 Applications and examples
4.5 Maximum Principle with transversality conditions
4.6 More applications
4.7 Maximum Principle with state constraints
4.8 More applications
5 Dynamic programming
5.1 Derivation of Bellman’s PDE
5.2 Examples
5.3 Relationship with Pontryagin Maximum Principle
6 Differential Game theory
6.1 Definitions
6.2 Dynamic Programming
6.3 Games and the Pontryagin Maximum Principle
6.4 Application war of attrition and attack
6.5 References
7 Introduction to stochastic control theory
7.1 Introduction and motivation
7.2 Review of probability theory, Brownian motion
7.3 Stochastic differential equations
7.4 Stochastic calculus, Itˆo chain rule
7.5 Dynamic programming
7.6 Application optimal portfolio selection
7.7 References
Appendix Proofs of the Pontryagin Maximum Principle
A.1 An informal derivation
A.2 Simple control variations
A.3 Free endpoint problem, no running payoff
A.4 Free endpoint problem with running payoffs
A.5 Multiple control variations
A.6 Fixed endpoint problem
A.7 References
References
An Introduction to Mathematical Optimal Control Theory Version 0.2 By Lawrence C. Evans Department of Mathematics University of California, Berkeley Chapter 1: Introduction Chapter 2: Controllability, bang-bang principle Chapter 3: Linear time-optimal control Chapter 4: The Pontryagin Maximum Principle Chapter 5: Dynamic programming Chapter 6: Game theory Chapter 7: Introduction to stochastic control theory Appendix: Proofs of the Pontryagin Maximum Principle Exercises References 1
PREFACE These notes build upon a course I taught at the University of Maryland during the fall of 1983. My great thanks go to Martino Bardi, who took careful notes, saved them all these years and recently mailed them to me. Faye Yeager typed up his notes into a first draft of these lectures as they now appear. Scott Armstrong read over the notes and suggested many improvements: thanks, Scott. Stephen Moye of the American Math Society helped me a lot with AMSTeX versus LaTeX issues. My thanks also to Atilla Yilmaz for spotting lots of typos and errors, which I have corrected. I have radically modified much of the notation (to be consistent with my other writings), updated the references, added several new examples, and provided a proof of the Pontryagin Maximum Principle. As this is a course for undergraduates, I have dispensed in certain proofs with various measurability and continuity issues, and as compensation have added various critiques as to the lack of total rigor. This current version of the notes is not yet complete, but meets I think the usual high standards for material posted on the internet. Please email me at evans@math.berkeley.edu with any corrections or comments. 2
CHAPTER 1: INTRODUCTION 1.1. The basic problem 1.2. Some examples 1.3. A geometric solution 1.4. Overview 1.1 THE BASIC PROBLEM. DYNAMICS. We open our discussion by considering an ordinary differential equation (ODE) having the form (1.1) ˙x(t) = f (x(t)) x(0) = x0. (t > 0) We are here given the initial point x0 ∈ Rn and the function f : Rn → Rn. The un- known is the curve x : [0, ∞) → Rn, which we interpret as the dynamical evolution of the state of some “system”. CONTROLLED DYNAMICS. We generalize a bit and suppose now that f depends also upon some “control” parameters belonging to a set A ⊂ Rm; so that f : Rn×A → Rn. Then if we select some value a ∈ A and consider the corresponding dynamics: ˙x(t) = f (x(t), a) x(0) = x0, (t > 0) we obtain the evolution of our system when the parameter is constantly set to the value a. The next possibility is that we change the value of the parameter as the system evolves. For instance, suppose we define the function α : [0, ∞) → A this way: α(t) =  a1 a2 a3 0 ≤ t ≤ t1 t1 < t ≤ t2 t2 < t ≤ t3 etc. for times 0 < t1 < t2 < t3 . . . and parameter values a1, a2, a3, · · · ∈ A; and we then solve the dynamical equation (1.2) ˙x(t) = f (x(t), α(t)) x(0) = x0. (t > 0) The picture illustrates the resulting evolution. The point is that the system may behave quite differently as we change the control parameters. 3
J E  A J! J J a =" a =! a = JH=A?JHOB,- a = N Controlled dynamics More generally, we call a function α : [0, ∞) → A a control. Corresponding to each control, we consider the ODE (ODE) ˙x(t) = f (x(t), α(t)) x(0) = x0, (t > 0) and regard the trajectory x(·) as the corresponding response of the system. NOTATION. (i) We will write f 1(x, a) ... f n(x, a)   f (x, a) =  x(t) =  x1(t) ... xn(t) .   to display the components of f , and similarly put We will therefore write vectors as columns in these notes and use boldface for vector-valued functions, the components of which have superscripts. (ii) We also introduce A = {α : [0, ∞) → A | α(·) measurable} 4
to denote the collection of all admissible controls, where α(t) =  α1(t) ... αm(t) .   Note very carefully that our solution x(·) of (ODE) depends upon α(·) and the initial condition. Consequently our notation would be more precise, but more complicated, if we were to write displaying the dependence of the response x(·) upon the control and the initial value. x(·) = x(·, α(·), x0), PAYOFFS. Our overall task will be to determine what is the “best” control for our system. For this we need to specify a specific payoff (or reward) criterion. Let us define the payoff functional (P) r(x(t), α(t)) dt + g(x(T )), P [α(·)] :=Z T 0 where x(·) solves (ODE) for the control α(·). Here r : Rn × A → R and g : Rn → R are given, and we call r the running payoff and g the terminal payoff. The terminal time T > 0 is given as well. THE BASIC PROBLEM. Our aim is to find a control α∗(·), which maximizes the payoff. In other words, we want P [α∗(·)] ≥ P [α(·)] for all controls α(·) ∈ A. Such a control α∗(·) is called optimal. This task presents us with these mathematical issues: (i) Does an optimal control exist? (ii) How can we characterize an optimal control mathematically? (iii) How can we construct an optimal control? These turn out to be sometimes subtle problems, as the following collection of examples illustrates. 1.2 EXAMPLES EXAMPLE 1: CONTROL OF PRODUCTION AND CONSUMPTION. Suppose we own, say, a factory whose output we can control. Let us begin to construct a mathematical model by setting x(t) = amount of output produced at time t ≥ 0. 5
We suppose that we consume some fraction of our output at each time, and likewise can reinvest the remaining fraction. Let us denote α(t) = fraction of output reinvested at time t ≥ 0. This will be our control, and is subject to the obvious constraint that Given such a control, the corresponding dynamics are provided by the ODE 0 ≤ α(t) ≤ 1 for each time t ≥ 0. ˙x(t) = kα(t)x(t) x(0) = x0. the constant k > 0 modelling the growth rate of our reinvestment. Let us take as a payoff functional P [α(·)] =Z T 0 (1 − α(t))x(t) dt. The meaning is that we want to maximize our total consumption of the output, our consumption at a given time t being (1− α(t))x(t). This model fits into our general framework for n = m = 1, once we put A = [0, 1], f (x, a) = kax, r(x, a) = (1 − a)x, g ≡ 0. a   a  J 6 A bang-bang control As we will see later in §4.4.2, an optimal control α∗(·) is given by α∗(t) = 1 if 0 ≤ t ≤ t∗ 0 if t∗ < t ≤ T for an appropriate switching time 0 ≤ t∗ ≤ T . In other words, we should reinvest all the output (and therefore consume nothing) up until time t∗, and afterwards, we should consume everything (and therefore reinvest nothing). The switchover time t∗ will have to be determined. We call α∗(·) a bang–bang control. EXAMPLE 2: REPRODUCTIVE STATEGIES IN SOCIAL INSECTS 6
The next example is from Chapter 2 of the book Caste and Ecology in Social Insects, by G. Oster and E. O. Wilson [O-W]. We attempt to model how social insects, say a population of bees, determine the makeup of their society. Let us write T for the length of the season, and introduce the variables w(t) = number of workers at time t q(t) = number of queens α(t) = fraction of colony effort devoted to increasing work force The control α is constrained by our requiring that 0 ≤ α(t) ≤ 1. We continue to model by introducing dynamics for the numbers of workers and the number of queens. The worker population evolves according to ˙w(t) = −µw(t) + bs(t)α(t)w(t) w(0) = w0. Here µ is a given constant (a death rate), b is another constant, and s(t) is the known rate at which each worker contributes to the bee economy. We suppose also that the population of queens changes according to ˙q(t) = −νq(t) + c(1 − α(t))s(t)w(t) q(0) = q0, for constants ν and c. Our goal, or rather the bees’, is to maximize the number of queens at time T : P [α(·)] = q(T ). So in terms of our general notation, we have x(t) = (w(t), q(t))T and x0 = (w0, q0)T . We are taking the running payoff to be r ≡ 0, and the terminal payoff g(w, q) = q. The answer will again turn out to be a bang–bang control, as we will explain later. EXAMPLE 3: A PENDULUM. We look next at a hanging pendulum, for which θ(t) = angle at time t. If there is no external force, then we have the equation of motion ¨θ(t) + λ ˙θ(t) + ω2θ(t) = 0 ˙θ(0) = θ2; θ(0) = θ1, the solution of which is a damped oscillation, provided λ > 0. 7
Now let α(·) denote an applied torque, subject to the physical constraint that Our dynamics now become |α| ≤ 1. ¨θ(t) + λ ˙θ(t) + ω2θ(t) = α(t) ˙θ(0) = θ2. θ(0) = θ1, Define x1(t) = θ(t), x2(t) = ˙θ(t), and x(t) = (x1(t), x2(t)). Then we can write the evolution as the system ˙x(t) = ˙x1 ˙x2 = ˙θ x2 −λx2 − ω2x1 + α(t) = f (x, α). We introduce as well for ¨θ = P [α(·)] = −Z τ 0 1 dt = −τ, τ = τ (α(·)) = first time that x(τ ) = 0 (that is, θ(τ ) = ˙θ(τ ) = 0.) We want to maximize P [·], meaning that we want to minimize the time it takes to bring the pendulum to rest. Observe that this problem does not quite fall within the general framework described in §1.1, since the terminal time is not fixed, but rather depends upon the control. This is called a fixed endpoint, free time problem. EXAMPLE 4: A MOON LANDER This model asks us to bring a spacecraft to a soft landing on the lunar surface, using the least amount of fuel. We introduce the notation h(t) = height at time t v(t) = velocity = ˙h(t) m(t) = mass of spacecraft (changing as fuel is burned) α(t) = thrust at time t We assume that and Newton’s law tells us that 0 ≤ α(t) ≤ 1, m¨h = −gm + α, the right hand side being the difference of the gravitational force and the thrust of the rocket. This system is modeled by the ODE   m(t) ˙v(t) = −g + α(t) ˙h(t) = v(t) ˙m(t) = −kα(t). 8
分享到:
收藏