Markov Decision Processes in Practice.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：8.81M 资料格式：pdf 举报版权申诉

cgs425449830-10804598-4744302542973871847.pdf-第1页.png

第1页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第2页.png

第2页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第3页.png

第3页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第4页.png

第4页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第5页.png

第5页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第6页.png

第6页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第7页.png

第7页 / 共563页

cgs425449830-10804598-4744302542973871847.pdf-第8页.png

第8页 / 共563页

Foreword

Preface

Part I: General Theory

Part II: Healthcare

Part III: Transportation

Part IV: Production

Part V: Communications

Part VI: Financial Modeling

Summarizing

Acknowledgments

Contents

List of Contributors

Part I General Theory

1 One-Step Improvement Ideas and Computational Aspects

1.1 Introduction

1.2 The Average-Cost Markov Decision Model

1.2.1 The Concept of Relative Values

1.2.2 The Policy-Improvement Step

1.2.3 The Odoni Bounds for Value Iteration

1.3 Tailor-Made Policy-Iteration Algorithm

1.3.1 A Queueing Control Problem with a Variable Service Rate

1.4 One-Step Policy Improvement for Suboptimal Policies

1.4.1 Dynamic Routing of Customers to Parallel Queues

1.5 One-Stage-Look-Ahead Rule in Optimal Stopping

1.5.1 Devil's Penny Problem

1.5.2 A Game of Dropping Balls into Bins

1.5.3 The Chow-Robbins Game

References

2 Value Function Approximation in Complex Queueing Systems

2.1 Introduction

2.2 Difference Calculus for Markovian Birth-Death Systems

2.3 Value Functions for Queueing Systems

2.3.1 The M/Cox(r)/1 Queue

2.3.2 Special Cases of the M/Cox(r)/1 Queue

2.3.3 The M/M/s Queue

2.3.4 The Blocking Costs in an M/M/s/s Queue

2.3.5 Priority Queues

2.4 Application: Routing to Parallel Queues

2.5 Application: Dynamic Routing in Multiskill Call Centers

2.6 Application: A Controlled Polling System

References

3 Approximate Dynamic Programming by Practical Examples

3.1 Introduction

3.2 The Nomadic Trucker Example

3.2.1 Problem Introduction

3.2.2 MDP Model

3.2.2.1 State

3.2.2.2 Decision

3.2.2.3 Costs

3.2.2.4 New Information and Transition Function

3.2.2.5 Solution

3.2.3 Approximate Dynamic Programming

3.2.3.1 Post-decision State

3.2.3.2 Forward Dynamic Programming

3.2.3.3 Value Function Approximation

3.3 A Freight Consolidation Example

3.3.1 Problem Introduction

3.3.2 MDP Model

3.3.2.1 State

3.3.2.2 Decision

3.3.2.3 Costs

3.3.2.4 New Information and Transition Function

3.3.2.5 Solution

3.3.3 Approximate Dynamic Programming

3.3.3.1 Post-decision State

3.3.3.2 Forward Dynamic Programming

3.3.3.3 Value Function Approximation

3.4 A Healthcare Example

3.4.1 Problem Introduction

3.4.2 MDP Model

3.4.2.1 State

3.4.2.2 Decision

3.4.2.3 Costs

3.4.2.4 New Information and Transition Function

3.4.2.5 Solution

3.4.3 Approximate Dynamic Programming

3.4.3.1 Post-decision State

3.4.3.2 Forward Dynamic Programming

3.4.3.3 Value Function Approximation

3.5 What's More

3.5.1 Policies

3.5.2 Value Function Approximations

3.5.3 Exploration vs Exploitation

Appendix

References

4 Server Optimization of Infinite Queueing Systems

4.1 Introduction

4.2 Basic Definition and Notations

4.3 Motivating Examples

4.3.1 Optimization of a Queueing System with Two Different Servers

4.3.2 Optimization of a Computational System with Power Saving Mode

4.3.3 Structural Properties of These Motivating Examples

4.4 Theoretical Background

4.4.1 Subset Measures in Markov Chains

4.4.2 Markov Chain Transformation

4.4.3 Markov Decision Processes with a Set of Uncontrolled States

4.4.3.1 Decisions Only in Subset1 Without an Effect on the Transitions to Subset2

4.4.3.2 Decisions Only in Subset1 with an Effect on the Transitions to Subset2

4.4.3.3 Decisions Only in Subset1 with Limited Boundary to the Other Set

4.4.4 Infinite Markov Chains with Regular Structure

4.4.4.1 Birth Death Process

4.5 Solution and Numerical Analysis of the Motivating Examples

4.5.1 Solution to the Queue with Two Different Servers

4.5.2 Solution to the Power-Saving Model

4.6 Further Examples

4.6.1 Optimization of a Queuing System with Two Markov Modulated Servers

4.6.2 Structural Properties of the Example with Markov Modulated Servers

4.7 Infinite MDPs with Quasi Birth Death Structure

4.7.1 Quasi Birth Death Process

4.7.2 Solving MDPs with QBD Structure

4.7.2.1 QBD Measures Associated Infinite Sets

4.8 Solution and Numerical Analysis of MDPs with QBD Structure

4.8.1 Solution of the Example with Markov ModulatedServers

4.8.2 Markov Modulated Server with Three Background States

4.9 Conclusion

References

5 Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art

5.1 Introduction

5.2 Discrete Time Model

5.2.1 Discounted Cost

5.2.1.1 Value Iteration

5.2.1.2 Roadmap to Structural Properties

5.2.2 Approximations/Perturbations

5.2.2.1 Type of Perturbations

5.2.2.2 Smoothed Rate Truncation (SRT)

5.2.3 Average Cost

5.2.3.1 Roadmap to Structural Properties

5.2.3.2 Examples Where the DAOE Has No Unique Solution

5.3 Continuous Time Model

5.3.1 Uniformisation

5.3.2 Discounted Cost

5.3.2.1 Roadmap to Structural Properties

5.3.3 Average Cost

5.3.4 Roadmap to Structural Properties

5.3.4.1 Roadmap Summary

5.3.5 Proofs

5.3.6 Tauberian Theorem

Appendix: Notation

References

Part II Healthcare

6 Markov Decision Processes for Screening and Treatment of Chronic Diseases

6.1 Introduction

6.2 Background on Chronic Disease Modeling

6.3 Modeling Framework for Chronic Diseases

6.3.1 MDP and POMDP Model Formulation

6.3.2 Solution Methods and Structural Properties

6.3.3 Model Validation

6.4 MDP Model for Cardiovascular Risk Control in Patients with Type 2 Diabetes

6.4.1 MDP Model Formulation

6.4.2 Results: Comparison of Optimal Policies Versus Published Guidelines

6.5 POMDP for Prostate Cancer Screening

6.5.1 POMDP Model Formulation

6.5.2 Results: Optimal Belief-Based Screening Policy

6.6 Open Challenges in MDPs for Chronic Disease

6.7 Conclusions

References

7 Stratified Breast Cancer Follow-Up Using a Partially Observable MDP

7.1 Introduction

7.2 Model Formulation

7.2.1 Optimality Equations

7.2.2 Alternative Representation of the Optimality Equations

7.2.3 Algorithm

7.3 Model Parameters

7.4 Results

7.4.1 Sensitivity Analyses

7.5 Conclusions and Discussion

Appendix: Notation

References

8 Advance Patient Appointment Scheduling

8.1 Introduction

8.2 Problem Description

8.3 Mathematical Formulation

8.3.1 Decision Epochs

8.3.2 State Space

8.3.3 Action Sets

8.3.4 Transition Probabilities

8.3.5 Immediate Cost

8.3.6 Optimality Equations

8.4 Solution Approach

8.5 Practical Results

8.5.1 Computerized Tomography Scan Appointment Scheduling

8.5.2 Radiation Therapy Treatment Scheduling

8.6 Discussion

8.7 Open Challenges

Appendix: Notation

References

9 Optimal Ambulance Dispatching

9.1 Introduction

9.1.1 Previous Work

9.1.2 Our Contribution

9.2 Problem Formulation

9.3 Solution Method: Markov Decision Process

9.3.1 State Space

9.3.2 Policy Definition

9.3.3 Rewards

9.3.3.1 Fraction of Late Arrivals

9.3.3.2 Average Response Time

9.3.4 Transition Probabilities

9.3.5 Value Iteration

9.4 Solution Method: Dynamic MEXCLP Heuristic for Dispatching

9.4.1 Coverage According to the MEXCLP Model

9.4.2 Applying MEXCLP to the Dispatch Process

9.5 Results: A Motivating Example

9.5.1 Fraction of Late Arrivals

9.5.2 Average Response Time

9.6 Results: Region Flevoland

9.6.1 Analysis of the MDP Solution for Flevoland

9.6.2 Results

9.7 Conclusion and Discussion

9.7.1 Further Research

Appendix: Notation

References

10 Blood Platelet Inventory Management

10.1 Introduction

10.1.1 Practical Motivation

10.1.2 SDP-Simulation Approach

10.1.3 Outline

10.2 Literature

10.3 SDP-Simulation Approach for the Stationary PPP

10.3.1 Steps of SDP-Simulation Approach

10.3.2 Step 1: SDP Model for Stationary PPP

10.3.3 Case Studies

10.4 Extended SDP-Simulation Approach for the Non-Stationary PPP

10.4.1 Problem: Non-Stationary Production Breaks

10.4.2 Extended SDP-Simulation Approach

10.4.3 Extension: Including Non-Stationary Periods

10.5 Case Study: Optimal Policy Around Breaks

10.5.1 Data

10.5.2 Step I: Stationary Problem

10.5.3 Steps II to IV: Christmas and New Year's Day

10.5.4 Steps II to IV: 4-Days Easter Weekend

10.5.5 Conclusions: Extended SDP-Simulation Approach

10.6 Discussion and Conclusions

Appendix: Notation

References

Part III Transportation

11 Stochastic Dynamic Programming for Noise Load Management

11.1 Introduction

11.2 Noise Load Management at Amsterdam Airport Schiphol

11.3 SDP for Noise Load Optimisation

11.4 Numerical Approach

11.4.1 Transition Probabilities

11.4.2 Discretisation

11.5 Numerical Results

11.5.1 Probability of Exceeding the Noise Load Limit

11.5.2 Comparison with the Heuristic

11.5.3 Increasing the Number of Decision Epochs

11.6 Discussion

Appendix

References

12 Allocation in a Vertical Rotary Car Park

12.1 Introduction

12.2 Background

12.2.1 The Car Parking Allocation Problem

12.2.2 Markov Decision Processes

12.3 The Markov Decision Process

12.4 Numerical Results

12.5 Simulation Results

12.6 Conclusion

Appendix

References

13 Dynamic Control of Traffic Lights

13.1 Problem

13.2 Markov Decision Process (MDP)

13.2.1 Examples: Terminology and Notations

13.2.2 MDP Model

13.2.2.1 State

13.2.2.2 Action

13.2.2.3 State Transition Probabilities

13.2.2.4 Contribution: Waiting Costs

13.2.2.5 Bellman Equation

13.2.2.6 Computational Complexity

13.3 Approximation by Policy Iteration

13.3.1 Policy Iteration (PI)

13.3.2 Initial Policy: Fixed Cycle (FC)

13.3.3 Policy Evaluation Step of FC

13.3.4 Single Policy Improvement Step: RV1 Policy

13.3.5 Computational Complexity of RV1

13.3.6 Additional Iterations of PI

13.4 Results

13.4.1 Simulation

13.4.2 Intersection F4C2

13.4.3 Complex Intersection F12C4

13.5 Discussion and Conclusions

Appendix: Notation

References

14 Smart Charging of Electric Vehicles

14.1 Introduction

14.2 Background on DSM and PowerMatcher

14.3 Optimal Charging Strategies

14.3.1 MDP/SDP Problem Formulation

14.3.2 Analytic Solution for i.i.d. Prices

14.3.3 DP-Heuristic Strategy

14.4 Numerical Results

14.5 Conclusion/Future Research

Appendix

References

Part IV Production

15 Analysis of a Stochastic Lot Scheduling Problem with Strict Due-Dates

15.1 Introduction

15.2 Theoretical Background of the CSLSP

15.3 Production System, Admissible Policies, and Objective Function

15.3.1 Production System

15.3.2 Admissible Actions and Policies

15.3.3 Objective Function

15.4 The Markov Decision Process

15.4.1 Format of a State

15.4.2 Actions and Operators

15.4.3 Transition Matrices

15.4.4 Further Aggregation in the Symmetric Case

15.4.5 State Space

15.4.6 A Heuristic Threshold Policy

15.5 Numerical Study

15.5.1 Influence of the Load and the Due-Date Horizon

15.5.2 Visualization of the Structure of the Optimal Policy

15.6 Conclusion

Appendix: Notation

References

16 Optimal Fishery Policies

16.1 Introduction

16.2 Model Description

16.2.1 Biological Dynamics; Growth of Biomass

16.2.2 Economic Dynamics; Harvest and InvestmentDecisions

16.2.3 Optimization Model

16.2.3.1 Decisions at Level 2

16.2.3.2 Decisions at Level 1

16.3 Model Analysis

16.3.1 Bounds on Decision and State Space

16.3.2 Equilibrium State Values in a Deterministic Setting

16.4 Discretization in the Value Iteration Approach

16.4.1 Deterministic Elaboration

16.4.2 Stochastic Implementation

16.4.3 Analysis of the Stochastic Model

16.5 Conclusions

Appendix: Notation

References

17 Near-Optimal Switching Strategies for a Tandem Queue

17.1 Introduction

17.2 Model Description: Single Service Model

17.3 Structural Properties of an Optimal Switching Curve

17.4 Matrix Geometric Method for Fixed Threshold Policies

17.5 Model Description: Batch Transition Model

17.5.1 Structural Properties of the Batch Service Model

17.5.2 Matrix Geometric Method with Batch Services

17.6 Simulation Experiments

17.7 Conclusion

References

Part V Communications

18 Wireless Channel Selection with Restless Bandits

18.1 Introduction

18.2 Reward-Observing Restless Multi-Armed Bandits

18.3 Index Policies and the Whittle Index

18.4 Numerical Illustration and Evaluation

18.5 Literature Survey

References

19 Flexible Staffing for Call Centers with Non-stationary ArrivalRates

19.1 Introduction

19.2 Problem Formulation

19.3 Solution Approach

19.4 Numerical Experiments

19.4.1 Constant Arrival Rate

19.4.1.1 Value of Flexibility

19.4.2 Time-Dependent Arrival Rate

19.4.2.1 Optimal Permanent Agents

19.4.3 Unknown Arrival Rate

19.5 Conclusion and Discussion

Appendix: Exact Solution

References

20 MDP for Query-Based Wireless Sensor Networks

20.1 Problem Description

20.2 Model Formulation

20.3 Continuous Time Markov Decision Process with a Drift

20.4 Exponentially Uniformized Markov Decision Process

20.5 Discrete Time and Discrete Space Markov Decision Problem

20.6 Standard Markov Decision Process

20.7 Fixed Assignment Policies

20.7.1 Always Assign Queries to the DB

20.7.2 Always Assign Queries to the WSN

20.8 Numerical Results

20.8.1 Performance of Fixed Policies vs. Optimal Policy

20.8.2 Optimal Policy Under Different Values of the Uniformization Parameter

20.9 Conclusion

Appendices

References

Part VI Financial Modeling

21 Optimal Portfolios and Pricing of Financial Derivatives Under Proportional Transaction Costs

21.1 Introduction

21.2 The Financial Model

21.3 The Markov Decision Model

21.4 Martingale Properties of the Optimal Markov Decision Process

21.5 Price Systems and the Numeraire Portfolio

21.6 Conclusive Remarks

Appendices

References

Appendix A: Basic Notation for MDP

Appendix B: Dichotomy and Criteria

International Series in Operations Research & Management Science Richard J. Boucherie Nico M. van Dijk Editors Markov Decision Processes in Practice

International Series in Operations Research & Management Science Volume 248 Series Editor Camille C. Price Stephen F. Austin State University, TX, USA Associate Series Editor Joe Zhu Worcester Polytechnic Institute, MA, USA Founding Series Editor Frederick S. Hillier Stanford University, CA, USA More information about this series at http://www.springer.com/series/6161

Richard J. Boucherie • Nico M. van Dijk Editors Markov Decision Processes in Practice 123

Editors Richard J. Boucherie Stochastic Operations Research University of Twente Enschede, The Netherlands Nico M. van Dijk Stochastic Operations Research University of Twente Enschede, The Netherlands ISSN 0884-8289 International Series in Operations Research & Management Science ISBN 978-3-319-47764-0 DOI 10.1007/978-3-319-47766-4 ISSN 2214-7934 (electronic) ISBN 978-3-319-47766-4 (eBook) Library of Congress Control Number: 2017932096 © Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Carla, Fabian, Daphne, Deirdre, and Dani¨el – Thanks for being there in difﬁcult times, Richard P. Dorreboom and his daughter – for coping with my passions, Nico

Foreword I had the pleasure of serving as the series editor of this series over its ﬁrst 20 years (from 1993 through October, 2013). One of the special pleasures of this work was the opportunity to become better acquainted with many of the leading researchers in our ﬁeld and to learn more about their research. This was especially true in the case of Nico M. van Dijk, who became a friend and overnight guest in our home. I then was delighted when Nico and his colleague, Richard J. Boucherie, agreed to be the editors of a handbook, Queueing Networks: A Fundamental Approach, that was published in 2010 as Vol. 154 in this series. This outstanding volume succeeded in deﬁning the current state of the art in this important area. Because of both its elegance and its great application potential, Markov deci- sion processes have been one of my favorite areas of operations research. A full chapter (Chap. 19 in the current tenth edition) is devoted to this topic in my text- book (coauthored by the late Gerald J. Lieberman), Introduction to Operations Re- search. However, I have long been frustrated by the sparsity of publications that describe applications of Markov decision processes. This was less true about 30 years ago when D.J. White published his seminal papers on such real applications in Interfaces (see the November–December 1985 and September–October 1988 is- sues). Unfortunately, relatively few papers or books since then have delved much into such applications. (One of these few publications is the 2002 book edited by Eugene Feinberg and Adam Shwartz, Handbook of Markov Decision Processes: Methods and Applications, which is Vol. 40 in this series.) Given the sparse literature in this important area, I was particularly delighted when the outstanding team of Nico M. van Dijk and Richard J. Boucherie accepted my invitation to be the editors of this exciting new book that focuses on Markov decision processes in practice. One of my last acts as the series editor was to work with these coeditors and the publisher in shepherding the book proposal through the process of providing the contract for its publication. I feel that this book may prove vii

viii Foreword to be one of the most important books in the series because it sheds so much light on the great application potential of Markov decision processes. This hopefully will lead to a renaissance in applying this powerful technique to numerous real problems. Stanford University July 2016 Frederick S. Hillier

Preface It is over 30 years ago since D.J. White started his series of surveys on practical applications of Markov decision processes (MDP),1,2,3 over 20 years after the phe- nomenal book by Martin Puterman on the theory of MDP,4 and over 10 years since Eugene A. Feinberg and Adam Shwartz published their Handbook of Markov De- cision Processes: Methods and Applications.5 In the past decades, the practical de- velopment of MDP seemed to have come to a halt with the general perception that MDP is computationally prohibitive. Accordingly, MDP is deemed unrealistic and is out of scope for many operations research practitioners. In addition, MDP is ham- pered by its notational complications and its conceptual complexity. As a result, MDP is often only brieﬂy covered in introductory operations research textbooks and courses. Recently developed approximation techniques supported by vastly in- creased numerical power have tackled part of the computational problems; see, e.g., Chaps. 2 and 3 of this handbook and the references therein. This handbook shows that a revival of MDP for practical purposes is justiﬁed for several reasons: 1. First and above all, the present-day numerical capabilities have enabled MDP to be invoked for real-life applications. 2. MDP allows to develop and formally support approximate and simple practical decision rules. 3. Last but not least, MDP’s probabilistic modeling of practical problems is a skill if not art by itself. 1 D.J. White. Real applications of Markov decision processes. Interfaces, 15:73–83, 1985. 2 D.J. White. Further real applications of Markov decision processes. Interfaces, 18:55–61, 1988. 3 D.J. White. A Survey of Applications of Markov Decision Processes. JournaloftheOperational ResearchSociety, 44:1073–1096, 1993. 4 Martin Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994. 5 Eugene A. Feinberg and Adam Shwartz, editors. Handbook of Markov Decision Processes: MethodsandApplications. Kluwer, 2002. ix

分享到：

赞收藏

资料库

Markov Decision Processes in Practice.pdf

相关推荐

人工智能

热门标签

最新资料