logo资料库

Markov Decision Processes in Practice.pdf

第1页 / 共563页
第2页 / 共563页
第3页 / 共563页
第4页 / 共563页
第5页 / 共563页
第6页 / 共563页
第7页 / 共563页
第8页 / 共563页
资料共563页,剩余部分请下载后查看
Foreword
Preface
Part I: General Theory
Part II: Healthcare
Part III: Transportation
Part IV: Production
Part V: Communications
Part VI: Financial Modeling
Summarizing
Acknowledgments
Contents
List of Contributors
Part I General Theory
1 One-Step Improvement Ideas and Computational Aspects
1.1 Introduction
1.2 The Average-Cost Markov Decision Model
1.2.1 The Concept of Relative Values
1.2.2 The Policy-Improvement Step
1.2.3 The Odoni Bounds for Value Iteration
1.3 Tailor-Made Policy-Iteration Algorithm
1.3.1 A Queueing Control Problem with a Variable Service Rate
1.4 One-Step Policy Improvement for Suboptimal Policies
1.4.1 Dynamic Routing of Customers to Parallel Queues
1.5 One-Stage-Look-Ahead Rule in Optimal Stopping
1.5.1 Devil's Penny Problem
1.5.2 A Game of Dropping Balls into Bins
1.5.3 The Chow-Robbins Game
References
2 Value Function Approximation in Complex Queueing Systems
2.1 Introduction
2.2 Difference Calculus for Markovian Birth-Death Systems
2.3 Value Functions for Queueing Systems
2.3.1 The M/Cox(r)/1 Queue
2.3.2 Special Cases of the M/Cox(r)/1 Queue
2.3.3 The M/M/s Queue
2.3.4 The Blocking Costs in an M/M/s/s Queue
2.3.5 Priority Queues
2.4 Application: Routing to Parallel Queues
2.5 Application: Dynamic Routing in Multiskill Call Centers
2.6 Application: A Controlled Polling System
References
3 Approximate Dynamic Programming by Practical Examples
3.1 Introduction
3.2 The Nomadic Trucker Example
3.2.1 Problem Introduction
3.2.2 MDP Model
3.2.2.1 State
3.2.2.2 Decision
3.2.2.3 Costs
3.2.2.4 New Information and Transition Function
3.2.2.5 Solution
3.2.3 Approximate Dynamic Programming
3.2.3.1 Post-decision State
3.2.3.2 Forward Dynamic Programming
3.2.3.3 Value Function Approximation
3.3 A Freight Consolidation Example
3.3.1 Problem Introduction
3.3.2 MDP Model
3.3.2.1 State
3.3.2.2 Decision
3.3.2.3 Costs
3.3.2.4 New Information and Transition Function
3.3.2.5 Solution
3.3.3 Approximate Dynamic Programming
3.3.3.1 Post-decision State
3.3.3.2 Forward Dynamic Programming
3.3.3.3 Value Function Approximation
3.4 A Healthcare Example
3.4.1 Problem Introduction
3.4.2 MDP Model
3.4.2.1 State
3.4.2.2 Decision
3.4.2.3 Costs
3.4.2.4 New Information and Transition Function
3.4.2.5 Solution
3.4.3 Approximate Dynamic Programming
3.4.3.1 Post-decision State
3.4.3.2 Forward Dynamic Programming
3.4.3.3 Value Function Approximation
3.5 What's More
3.5.1 Policies
3.5.2 Value Function Approximations
3.5.3 Exploration vs Exploitation
Appendix
References
4 Server Optimization of Infinite Queueing Systems
4.1 Introduction
4.2 Basic Definition and Notations
4.3 Motivating Examples
4.3.1 Optimization of a Queueing System with Two Different Servers
4.3.2 Optimization of a Computational System with Power Saving Mode
4.3.3 Structural Properties of These Motivating Examples
4.4 Theoretical Background
4.4.1 Subset Measures in Markov Chains
4.4.2 Markov Chain Transformation
4.4.3 Markov Decision Processes with a Set of Uncontrolled States
4.4.3.1 Decisions Only in Subset1 Without an Effect on the Transitions to Subset2
4.4.3.2 Decisions Only in Subset1 with an Effect on the Transitions to Subset2
4.4.3.3 Decisions Only in Subset1 with Limited Boundary to the Other Set
4.4.4 Infinite Markov Chains with Regular Structure
4.4.4.1 Birth Death Process
4.5 Solution and Numerical Analysis of the Motivating Examples
4.5.1 Solution to the Queue with Two Different Servers
4.5.2 Solution to the Power-Saving Model
4.6 Further Examples
4.6.1 Optimization of a Queuing System with Two Markov Modulated Servers
4.6.2 Structural Properties of the Example with Markov Modulated Servers
4.7 Infinite MDPs with Quasi Birth Death Structure
4.7.1 Quasi Birth Death Process
4.7.2 Solving MDPs with QBD Structure
4.7.2.1 QBD Measures Associated Infinite Sets
4.8 Solution and Numerical Analysis of MDPs with QBD Structure
4.8.1 Solution of the Example with Markov ModulatedServers
4.8.2 Markov Modulated Server with Three Background States
4.9 Conclusion
References
5 Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art
5.1 Introduction
5.2 Discrete Time Model
5.2.1 Discounted Cost
5.2.1.1 Value Iteration
5.2.1.2 Roadmap to Structural Properties
5.2.2 Approximations/Perturbations
5.2.2.1 Type of Perturbations
5.2.2.2 Smoothed Rate Truncation (SRT)
5.2.3 Average Cost
5.2.3.1 Roadmap to Structural Properties
5.2.3.2 Examples Where the DAOE Has No Unique Solution
5.3 Continuous Time Model
5.3.1 Uniformisation
5.3.2 Discounted Cost
5.3.2.1 Roadmap to Structural Properties
5.3.3 Average Cost
5.3.4 Roadmap to Structural Properties
5.3.4.1 Roadmap Summary
5.3.5 Proofs
5.3.6 Tauberian Theorem
Appendix: Notation
References
Part II Healthcare
6 Markov Decision Processes for Screening and Treatment of Chronic Diseases
6.1 Introduction
6.2 Background on Chronic Disease Modeling
6.3 Modeling Framework for Chronic Diseases
6.3.1 MDP and POMDP Model Formulation
6.3.2 Solution Methods and Structural Properties
6.3.3 Model Validation
6.4 MDP Model for Cardiovascular Risk Control in Patients with Type 2 Diabetes
6.4.1 MDP Model Formulation
6.4.2 Results: Comparison of Optimal Policies Versus Published Guidelines
6.5 POMDP for Prostate Cancer Screening
6.5.1 POMDP Model Formulation
6.5.2 Results: Optimal Belief-Based Screening Policy
6.6 Open Challenges in MDPs for Chronic Disease
6.7 Conclusions
References
7 Stratified Breast Cancer Follow-Up Using a Partially Observable MDP
7.1 Introduction
7.2 Model Formulation
7.2.1 Optimality Equations
7.2.2 Alternative Representation of the Optimality Equations
7.2.3 Algorithm
7.3 Model Parameters
7.4 Results
7.4.1 Sensitivity Analyses
7.5 Conclusions and Discussion
Appendix: Notation
References
8 Advance Patient Appointment Scheduling
8.1 Introduction
8.2 Problem Description
8.3 Mathematical Formulation
8.3.1 Decision Epochs
8.3.2 State Space
8.3.3 Action Sets
8.3.4 Transition Probabilities
8.3.5 Immediate Cost
8.3.6 Optimality Equations
8.4 Solution Approach
8.5 Practical Results
8.5.1 Computerized Tomography Scan Appointment Scheduling
8.5.2 Radiation Therapy Treatment Scheduling
8.6 Discussion
8.7 Open Challenges
Appendix: Notation
References
9 Optimal Ambulance Dispatching
9.1 Introduction
9.1.1 Previous Work
9.1.2 Our Contribution
9.2 Problem Formulation
9.3 Solution Method: Markov Decision Process
9.3.1 State Space
9.3.2 Policy Definition
9.3.3 Rewards
9.3.3.1 Fraction of Late Arrivals
9.3.3.2 Average Response Time
9.3.4 Transition Probabilities
9.3.5 Value Iteration
9.4 Solution Method: Dynamic MEXCLP Heuristic for Dispatching
9.4.1 Coverage According to the MEXCLP Model
9.4.2 Applying MEXCLP to the Dispatch Process
9.5 Results: A Motivating Example
9.5.1 Fraction of Late Arrivals
9.5.2 Average Response Time
9.6 Results: Region Flevoland
9.6.1 Analysis of the MDP Solution for Flevoland
9.6.2 Results
9.7 Conclusion and Discussion
9.7.1 Further Research
Appendix: Notation
References
10 Blood Platelet Inventory Management
10.1 Introduction
10.1.1 Practical Motivation
10.1.2 SDP-Simulation Approach
10.1.3 Outline
10.2 Literature
10.3 SDP-Simulation Approach for the Stationary PPP
10.3.1 Steps of SDP-Simulation Approach
10.3.2 Step 1: SDP Model for Stationary PPP
10.3.3 Case Studies
10.4 Extended SDP-Simulation Approach for the Non-Stationary PPP
10.4.1 Problem: Non-Stationary Production Breaks
10.4.2 Extended SDP-Simulation Approach
10.4.3 Extension: Including Non-Stationary Periods
10.5 Case Study: Optimal Policy Around Breaks
10.5.1 Data
10.5.2 Step I: Stationary Problem
10.5.3 Steps II to IV: Christmas and New Year's Day
10.5.4 Steps II to IV: 4-Days Easter Weekend
10.5.5 Conclusions: Extended SDP-Simulation Approach
10.6 Discussion and Conclusions
Appendix: Notation
References
Part III Transportation
11 Stochastic Dynamic Programming for Noise Load Management
11.1 Introduction
11.2 Noise Load Management at Amsterdam Airport Schiphol
11.3 SDP for Noise Load Optimisation
11.4 Numerical Approach
11.4.1 Transition Probabilities
11.4.2 Discretisation
11.5 Numerical Results
11.5.1 Probability of Exceeding the Noise Load Limit
11.5.2 Comparison with the Heuristic
11.5.3 Increasing the Number of Decision Epochs
11.6 Discussion
Appendix
References
12 Allocation in a Vertical Rotary Car Park
12.1 Introduction
12.2 Background
12.2.1 The Car Parking Allocation Problem
12.2.2 Markov Decision Processes
12.3 The Markov Decision Process
12.4 Numerical Results
12.5 Simulation Results
12.6 Conclusion
Appendix
References
13 Dynamic Control of Traffic Lights
13.1 Problem
13.2 Markov Decision Process (MDP)
13.2.1 Examples: Terminology and Notations
13.2.2 MDP Model
13.2.2.1 State
13.2.2.2 Action
13.2.2.3 State Transition Probabilities
13.2.2.4 Contribution: Waiting Costs
13.2.2.5 Bellman Equation
13.2.2.6 Computational Complexity
13.3 Approximation by Policy Iteration
13.3.1 Policy Iteration (PI)
13.3.2 Initial Policy: Fixed Cycle (FC)
13.3.3 Policy Evaluation Step of FC
13.3.4 Single Policy Improvement Step: RV1 Policy
13.3.5 Computational Complexity of RV1
13.3.6 Additional Iterations of PI
13.4 Results
13.4.1 Simulation
13.4.2 Intersection F4C2
13.4.3 Complex Intersection F12C4
13.5 Discussion and Conclusions
Appendix: Notation
References
14 Smart Charging of Electric Vehicles
14.1 Introduction
14.2 Background on DSM and PowerMatcher
14.3 Optimal Charging Strategies
14.3.1 MDP/SDP Problem Formulation
14.3.2 Analytic Solution for i.i.d. Prices
14.3.3 DP-Heuristic Strategy
14.4 Numerical Results
14.5 Conclusion/Future Research
Appendix
References
Part IV Production
15 Analysis of a Stochastic Lot Scheduling Problem with Strict Due-Dates
15.1 Introduction
15.2 Theoretical Background of the CSLSP
15.3 Production System, Admissible Policies, and Objective Function
15.3.1 Production System
15.3.2 Admissible Actions and Policies
15.3.3 Objective Function
15.4 The Markov Decision Process
15.4.1 Format of a State
15.4.2 Actions and Operators
15.4.3 Transition Matrices
15.4.4 Further Aggregation in the Symmetric Case
15.4.5 State Space
15.4.6 A Heuristic Threshold Policy
15.5 Numerical Study
15.5.1 Influence of the Load and the Due-Date Horizon
15.5.2 Visualization of the Structure of the Optimal Policy
15.6 Conclusion
Appendix: Notation
References
16 Optimal Fishery Policies
16.1 Introduction
16.2 Model Description
16.2.1 Biological Dynamics; Growth of Biomass
16.2.2 Economic Dynamics; Harvest and InvestmentDecisions
16.2.3 Optimization Model
16.2.3.1 Decisions at Level 2
16.2.3.2 Decisions at Level 1
16.3 Model Analysis
16.3.1 Bounds on Decision and State Space
16.3.2 Equilibrium State Values in a Deterministic Setting
16.4 Discretization in the Value Iteration Approach
16.4.1 Deterministic Elaboration
16.4.2 Stochastic Implementation
16.4.3 Analysis of the Stochastic Model
16.5 Conclusions
Appendix: Notation
References
17 Near-Optimal Switching Strategies for a Tandem Queue
17.1 Introduction
17.2 Model Description: Single Service Model
17.3 Structural Properties of an Optimal Switching Curve
17.4 Matrix Geometric Method for Fixed Threshold Policies
17.5 Model Description: Batch Transition Model
17.5.1 Structural Properties of the Batch Service Model
17.5.2 Matrix Geometric Method with Batch Services
17.6 Simulation Experiments
17.7 Conclusion
References
Part V Communications
18 Wireless Channel Selection with Restless Bandits
18.1 Introduction
18.2 Reward-Observing Restless Multi-Armed Bandits
18.3 Index Policies and the Whittle Index
18.4 Numerical Illustration and Evaluation
18.5 Literature Survey
References
19 Flexible Staffing for Call Centers with Non-stationary ArrivalRates
19.1 Introduction
19.2 Problem Formulation
19.3 Solution Approach
19.4 Numerical Experiments
19.4.1 Constant Arrival Rate
19.4.1.1 Value of Flexibility
19.4.2 Time-Dependent Arrival Rate
19.4.2.1 Optimal Permanent Agents
19.4.3 Unknown Arrival Rate
19.5 Conclusion and Discussion
Appendix: Exact Solution
References
20 MDP for Query-Based Wireless Sensor Networks
20.1 Problem Description
20.2 Model Formulation
20.3 Continuous Time Markov Decision Process with a Drift
20.4 Exponentially Uniformized Markov Decision Process
20.5 Discrete Time and Discrete Space Markov Decision Problem
20.6 Standard Markov Decision Process
20.7 Fixed Assignment Policies
20.7.1 Always Assign Queries to the DB
20.7.2 Always Assign Queries to the WSN
20.8 Numerical Results
20.8.1 Performance of Fixed Policies vs. Optimal Policy
20.8.2 Optimal Policy Under Different Values of the Uniformization Parameter
20.9 Conclusion
Appendices
References
Part VI Financial Modeling
21 Optimal Portfolios and Pricing of Financial Derivatives Under Proportional Transaction Costs
21.1 Introduction
21.2 The Financial Model
21.3 The Markov Decision Model
21.4 Martingale Properties of the Optimal Markov Decision Process
21.5 Price Systems and the Numeraire Portfolio
21.6 Conclusive Remarks
Appendices
References
Appendix A: Basic Notation for MDP
Appendix B: Dichotomy and Criteria
International Series in Operations Research & Management Science Richard J. Boucherie Nico M. van Dijk Editors Markov Decision Processes in Practice
International Series in Operations Research & Management Science Volume 248 Series Editor Camille C. Price Stephen F. Austin State University, TX, USA Associate Series Editor Joe Zhu Worcester Polytechnic Institute, MA, USA Founding Series Editor Frederick S. Hillier Stanford University, CA, USA More information about this series at http://www.springer.com/series/6161
Richard J. Boucherie • Nico M. van Dijk Editors Markov Decision Processes in Practice 123
Editors Richard J. Boucherie Stochastic Operations Research University of Twente Enschede, The Netherlands Nico M. van Dijk Stochastic Operations Research University of Twente Enschede, The Netherlands ISSN 0884-8289 International Series in Operations Research & Management Science ISBN 978-3-319-47764-0 DOI 10.1007/978-3-319-47766-4 ISSN 2214-7934 (electronic) ISBN 978-3-319-47766-4 (eBook) Library of Congress Control Number: 2017932096 © Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Carla, Fabian, Daphne, Deirdre, and Dani¨el – Thanks for being there in difficult times, Richard P. Dorreboom and his daughter – for coping with my passions, Nico
Foreword I had the pleasure of serving as the series editor of this series over its first 20 years (from 1993 through October, 2013). One of the special pleasures of this work was the opportunity to become better acquainted with many of the leading researchers in our field and to learn more about their research. This was especially true in the case of Nico M. van Dijk, who became a friend and overnight guest in our home. I then was delighted when Nico and his colleague, Richard J. Boucherie, agreed to be the editors of a handbook, Queueing Networks: A Fundamental Approach, that was published in 2010 as Vol. 154 in this series. This outstanding volume succeeded in defining the current state of the art in this important area. Because of both its elegance and its great application potential, Markov deci- sion processes have been one of my favorite areas of operations research. A full chapter (Chap. 19 in the current tenth edition) is devoted to this topic in my text- book (coauthored by the late Gerald J. Lieberman), Introduction to Operations Re- search. However, I have long been frustrated by the sparsity of publications that describe applications of Markov decision processes. This was less true about 30 years ago when D.J. White published his seminal papers on such real applications in Interfaces (see the November–December 1985 and September–October 1988 is- sues). Unfortunately, relatively few papers or books since then have delved much into such applications. (One of these few publications is the 2002 book edited by Eugene Feinberg and Adam Shwartz, Handbook of Markov Decision Processes: Methods and Applications, which is Vol. 40 in this series.) Given the sparse literature in this important area, I was particularly delighted when the outstanding team of Nico M. van Dijk and Richard J. Boucherie accepted my invitation to be the editors of this exciting new book that focuses on Markov decision processes in practice. One of my last acts as the series editor was to work with these coeditors and the publisher in shepherding the book proposal through the process of providing the contract for its publication. I feel that this book may prove vii
viii Foreword to be one of the most important books in the series because it sheds so much light on the great application potential of Markov decision processes. This hopefully will lead to a renaissance in applying this powerful technique to numerous real problems. Stanford University July 2016 Frederick S. Hillier
Preface It is over 30 years ago since D.J. White started his series of surveys on practical applications of Markov decision processes (MDP),1,2,3 over 20 years after the phe- nomenal book by Martin Puterman on the theory of MDP,4 and over 10 years since Eugene A. Feinberg and Adam Shwartz published their Handbook of Markov De- cision Processes: Methods and Applications.5 In the past decades, the practical de- velopment of MDP seemed to have come to a halt with the general perception that MDP is computationally prohibitive. Accordingly, MDP is deemed unrealistic and is out of scope for many operations research practitioners. In addition, MDP is ham- pered by its notational complications and its conceptual complexity. As a result, MDP is often only briefly covered in introductory operations research textbooks and courses. Recently developed approximation techniques supported by vastly in- creased numerical power have tackled part of the computational problems; see, e.g., Chaps. 2 and 3 of this handbook and the references therein. This handbook shows that a revival of MDP for practical purposes is justified for several reasons: 1. First and above all, the present-day numerical capabilities have enabled MDP to be invoked for real-life applications. 2. MDP allows to develop and formally support approximate and simple practical decision rules. 3. Last but not least, MDP’s probabilistic modeling of practical problems is a skill if not art by itself. 1 D.J. White. Real applications of Markov decision processes. Interfaces, 15:73–83, 1985. 2 D.J. White. Further real applications of Markov decision processes. Interfaces, 18:55–61, 1988. 3 D.J. White. A Survey of Applications of Markov Decision Processes. JournaloftheOperational ResearchSociety, 44:1073–1096, 1993. 4 Martin Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994. 5 Eugene A. Feinberg and Adam Shwartz, editors. Handbook of Markov Decision Processes: MethodsandApplications. Kluwer, 2002. ix
分享到:
收藏