Foreword
Preface
Part I: General Theory
Part II: Healthcare
Part III: Transportation
Part IV: Production
Part V: Communications
Part VI: Financial Modeling
Summarizing
Acknowledgments
Contents
List of Contributors
Part I General Theory
1 One-Step Improvement Ideas and Computational Aspects
1.1 Introduction
1.2 The Average-Cost Markov Decision Model
1.2.1 The Concept of Relative Values
1.2.2 The Policy-Improvement Step
1.2.3 The Odoni Bounds for Value Iteration
1.3 Tailor-Made Policy-Iteration Algorithm
1.3.1 A Queueing Control Problem with a Variable Service Rate
1.4 One-Step Policy Improvement for Suboptimal Policies
1.4.1 Dynamic Routing of Customers to Parallel Queues
1.5 One-Stage-Look-Ahead Rule in Optimal Stopping
1.5.1 Devil's Penny Problem
1.5.2 A Game of Dropping Balls into Bins
1.5.3 The Chow-Robbins Game
References
2 Value Function Approximation in Complex Queueing Systems
2.1 Introduction
2.2 Difference Calculus for Markovian Birth-Death Systems
2.3 Value Functions for Queueing Systems
2.3.1 The M/Cox(r)/1 Queue
2.3.2 Special Cases of the M/Cox(r)/1 Queue
2.3.3 The M/M/s Queue
2.3.4 The Blocking Costs in an M/M/s/s Queue
2.3.5 Priority Queues
2.4 Application: Routing to Parallel Queues
2.5 Application: Dynamic Routing in Multiskill Call Centers
2.6 Application: A Controlled Polling System
References
3 Approximate Dynamic Programming by Practical Examples
3.1 Introduction
3.2 The Nomadic Trucker Example
3.2.1 Problem Introduction
3.2.2 MDP Model
3.2.2.1 State
3.2.2.2 Decision
3.2.2.3 Costs
3.2.2.4 New Information and Transition Function
3.2.2.5 Solution
3.2.3 Approximate Dynamic Programming
3.2.3.1 Post-decision State
3.2.3.2 Forward Dynamic Programming
3.2.3.3 Value Function Approximation
3.3 A Freight Consolidation Example
3.3.1 Problem Introduction
3.3.2 MDP Model
3.3.2.1 State
3.3.2.2 Decision
3.3.2.3 Costs
3.3.2.4 New Information and Transition Function
3.3.2.5 Solution
3.3.3 Approximate Dynamic Programming
3.3.3.1 Post-decision State
3.3.3.2 Forward Dynamic Programming
3.3.3.3 Value Function Approximation
3.4 A Healthcare Example
3.4.1 Problem Introduction
3.4.2 MDP Model
3.4.2.1 State
3.4.2.2 Decision
3.4.2.3 Costs
3.4.2.4 New Information and Transition Function
3.4.2.5 Solution
3.4.3 Approximate Dynamic Programming
3.4.3.1 Post-decision State
3.4.3.2 Forward Dynamic Programming
3.4.3.3 Value Function Approximation
3.5 What's More
3.5.1 Policies
3.5.2 Value Function Approximations
3.5.3 Exploration vs Exploitation
Appendix
References
4 Server Optimization of Infinite Queueing Systems
4.1 Introduction
4.2 Basic Definition and Notations
4.3 Motivating Examples
4.3.1 Optimization of a Queueing System with Two Different Servers
4.3.2 Optimization of a Computational System with Power Saving Mode
4.3.3 Structural Properties of These Motivating Examples
4.4 Theoretical Background
4.4.1 Subset Measures in Markov Chains
4.4.2 Markov Chain Transformation
4.4.3 Markov Decision Processes with a Set of Uncontrolled States
4.4.3.1 Decisions Only in Subset1 Without an Effect on the Transitions to Subset2
4.4.3.2 Decisions Only in Subset1 with an Effect on the Transitions to Subset2
4.4.3.3 Decisions Only in Subset1 with Limited Boundary to the Other Set
4.4.4 Infinite Markov Chains with Regular Structure
4.4.4.1 Birth Death Process
4.5 Solution and Numerical Analysis of the Motivating Examples
4.5.1 Solution to the Queue with Two Different Servers
4.5.2 Solution to the Power-Saving Model
4.6 Further Examples
4.6.1 Optimization of a Queuing System with Two Markov Modulated Servers
4.6.2 Structural Properties of the Example with Markov Modulated Servers
4.7 Infinite MDPs with Quasi Birth Death Structure
4.7.1 Quasi Birth Death Process
4.7.2 Solving MDPs with QBD Structure
4.7.2.1 QBD Measures Associated Infinite Sets
4.8 Solution and Numerical Analysis of MDPs with QBD Structure
4.8.1 Solution of the Example with Markov ModulatedServers
4.8.2 Markov Modulated Server with Three Background States
4.9 Conclusion
References
5 Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art
5.1 Introduction
5.2 Discrete Time Model
5.2.1 Discounted Cost
5.2.1.1 Value Iteration
5.2.1.2 Roadmap to Structural Properties
5.2.2 Approximations/Perturbations
5.2.2.1 Type of Perturbations
5.2.2.2 Smoothed Rate Truncation (SRT)
5.2.3 Average Cost
5.2.3.1 Roadmap to Structural Properties
5.2.3.2 Examples Where the DAOE Has No Unique Solution
5.3 Continuous Time Model
5.3.1 Uniformisation
5.3.2 Discounted Cost
5.3.2.1 Roadmap to Structural Properties
5.3.3 Average Cost
5.3.4 Roadmap to Structural Properties
5.3.4.1 Roadmap Summary
5.3.5 Proofs
5.3.6 Tauberian Theorem
Appendix: Notation
References
Part II Healthcare
6 Markov Decision Processes for Screening and Treatment of Chronic Diseases
6.1 Introduction
6.2 Background on Chronic Disease Modeling
6.3 Modeling Framework for Chronic Diseases
6.3.1 MDP and POMDP Model Formulation
6.3.2 Solution Methods and Structural Properties
6.3.3 Model Validation
6.4 MDP Model for Cardiovascular Risk Control in Patients with Type 2 Diabetes
6.4.1 MDP Model Formulation
6.4.2 Results: Comparison of Optimal Policies Versus Published Guidelines
6.5 POMDP for Prostate Cancer Screening
6.5.1 POMDP Model Formulation
6.5.2 Results: Optimal Belief-Based Screening Policy
6.6 Open Challenges in MDPs for Chronic Disease
6.7 Conclusions
References
7 Stratified Breast Cancer Follow-Up Using a Partially Observable MDP
7.1 Introduction
7.2 Model Formulation
7.2.1 Optimality Equations
7.2.2 Alternative Representation of the Optimality Equations
7.2.3 Algorithm
7.3 Model Parameters
7.4 Results
7.4.1 Sensitivity Analyses
7.5 Conclusions and Discussion
Appendix: Notation
References
8 Advance Patient Appointment Scheduling
8.1 Introduction
8.2 Problem Description
8.3 Mathematical Formulation
8.3.1 Decision Epochs
8.3.2 State Space
8.3.3 Action Sets
8.3.4 Transition Probabilities
8.3.5 Immediate Cost
8.3.6 Optimality Equations
8.4 Solution Approach
8.5 Practical Results
8.5.1 Computerized Tomography Scan Appointment Scheduling
8.5.2 Radiation Therapy Treatment Scheduling
8.6 Discussion
8.7 Open Challenges
Appendix: Notation
References
9 Optimal Ambulance Dispatching
9.1 Introduction
9.1.1 Previous Work
9.1.2 Our Contribution
9.2 Problem Formulation
9.3 Solution Method: Markov Decision Process
9.3.1 State Space
9.3.2 Policy Definition
9.3.3 Rewards
9.3.3.1 Fraction of Late Arrivals
9.3.3.2 Average Response Time
9.3.4 Transition Probabilities
9.3.5 Value Iteration
9.4 Solution Method: Dynamic MEXCLP Heuristic for Dispatching
9.4.1 Coverage According to the MEXCLP Model
9.4.2 Applying MEXCLP to the Dispatch Process
9.5 Results: A Motivating Example
9.5.1 Fraction of Late Arrivals
9.5.2 Average Response Time
9.6 Results: Region Flevoland
9.6.1 Analysis of the MDP Solution for Flevoland
9.6.2 Results
9.7 Conclusion and Discussion
9.7.1 Further Research
Appendix: Notation
References
10 Blood Platelet Inventory Management
10.1 Introduction
10.1.1 Practical Motivation
10.1.2 SDP-Simulation Approach
10.1.3 Outline
10.2 Literature
10.3 SDP-Simulation Approach for the Stationary PPP
10.3.1 Steps of SDP-Simulation Approach
10.3.2 Step 1: SDP Model for Stationary PPP
10.3.3 Case Studies
10.4 Extended SDP-Simulation Approach for the Non-Stationary PPP
10.4.1 Problem: Non-Stationary Production Breaks
10.4.2 Extended SDP-Simulation Approach
10.4.3 Extension: Including Non-Stationary Periods
10.5 Case Study: Optimal Policy Around Breaks
10.5.1 Data
10.5.2 Step I: Stationary Problem
10.5.3 Steps II to IV: Christmas and New Year's Day
10.5.4 Steps II to IV: 4-Days Easter Weekend
10.5.5 Conclusions: Extended SDP-Simulation Approach
10.6 Discussion and Conclusions
Appendix: Notation
References
Part III Transportation
11 Stochastic Dynamic Programming for Noise Load Management
11.1 Introduction
11.2 Noise Load Management at Amsterdam Airport Schiphol
11.3 SDP for Noise Load Optimisation
11.4 Numerical Approach
11.4.1 Transition Probabilities
11.4.2 Discretisation
11.5 Numerical Results
11.5.1 Probability of Exceeding the Noise Load Limit
11.5.2 Comparison with the Heuristic
11.5.3 Increasing the Number of Decision Epochs
11.6 Discussion
Appendix
References
12 Allocation in a Vertical Rotary Car Park
12.1 Introduction
12.2 Background
12.2.1 The Car Parking Allocation Problem
12.2.2 Markov Decision Processes
12.3 The Markov Decision Process
12.4 Numerical Results
12.5 Simulation Results
12.6 Conclusion
Appendix
References
13 Dynamic Control of Traffic Lights
13.1 Problem
13.2 Markov Decision Process (MDP)
13.2.1 Examples: Terminology and Notations
13.2.2 MDP Model
13.2.2.1 State
13.2.2.2 Action
13.2.2.3 State Transition Probabilities
13.2.2.4 Contribution: Waiting Costs
13.2.2.5 Bellman Equation
13.2.2.6 Computational Complexity
13.3 Approximation by Policy Iteration
13.3.1 Policy Iteration (PI)
13.3.2 Initial Policy: Fixed Cycle (FC)
13.3.3 Policy Evaluation Step of FC
13.3.4 Single Policy Improvement Step: RV1 Policy
13.3.5 Computational Complexity of RV1
13.3.6 Additional Iterations of PI
13.4 Results
13.4.1 Simulation
13.4.2 Intersection F4C2
13.4.3 Complex Intersection F12C4
13.5 Discussion and Conclusions
Appendix: Notation
References
14 Smart Charging of Electric Vehicles
14.1 Introduction
14.2 Background on DSM and PowerMatcher
14.3 Optimal Charging Strategies
14.3.1 MDP/SDP Problem Formulation
14.3.2 Analytic Solution for i.i.d. Prices
14.3.3 DP-Heuristic Strategy
14.4 Numerical Results
14.5 Conclusion/Future Research
Appendix
References
Part IV Production
15 Analysis of a Stochastic Lot Scheduling Problem with Strict Due-Dates
15.1 Introduction
15.2 Theoretical Background of the CSLSP
15.3 Production System, Admissible Policies, and Objective Function
15.3.1 Production System
15.3.2 Admissible Actions and Policies
15.3.3 Objective Function
15.4 The Markov Decision Process
15.4.1 Format of a State
15.4.2 Actions and Operators
15.4.3 Transition Matrices
15.4.4 Further Aggregation in the Symmetric Case
15.4.5 State Space
15.4.6 A Heuristic Threshold Policy
15.5 Numerical Study
15.5.1 Influence of the Load and the Due-Date Horizon
15.5.2 Visualization of the Structure of the Optimal Policy
15.6 Conclusion
Appendix: Notation
References
16 Optimal Fishery Policies
16.1 Introduction
16.2 Model Description
16.2.1 Biological Dynamics; Growth of Biomass
16.2.2 Economic Dynamics; Harvest and InvestmentDecisions
16.2.3 Optimization Model
16.2.3.1 Decisions at Level 2
16.2.3.2 Decisions at Level 1
16.3 Model Analysis
16.3.1 Bounds on Decision and State Space
16.3.2 Equilibrium State Values in a Deterministic Setting
16.4 Discretization in the Value Iteration Approach
16.4.1 Deterministic Elaboration
16.4.2 Stochastic Implementation
16.4.3 Analysis of the Stochastic Model
16.5 Conclusions
Appendix: Notation
References
17 Near-Optimal Switching Strategies for a Tandem Queue
17.1 Introduction
17.2 Model Description: Single Service Model
17.3 Structural Properties of an Optimal Switching Curve
17.4 Matrix Geometric Method for Fixed Threshold Policies
17.5 Model Description: Batch Transition Model
17.5.1 Structural Properties of the Batch Service Model
17.5.2 Matrix Geometric Method with Batch Services
17.6 Simulation Experiments
17.7 Conclusion
References
Part V Communications
18 Wireless Channel Selection with Restless Bandits
18.1 Introduction
18.2 Reward-Observing Restless Multi-Armed Bandits
18.3 Index Policies and the Whittle Index
18.4 Numerical Illustration and Evaluation
18.5 Literature Survey
References
19 Flexible Staffing for Call Centers with Non-stationary ArrivalRates
19.1 Introduction
19.2 Problem Formulation
19.3 Solution Approach
19.4 Numerical Experiments
19.4.1 Constant Arrival Rate
19.4.1.1 Value of Flexibility
19.4.2 Time-Dependent Arrival Rate
19.4.2.1 Optimal Permanent Agents
19.4.3 Unknown Arrival Rate
19.5 Conclusion and Discussion
Appendix: Exact Solution
References
20 MDP for Query-Based Wireless Sensor Networks
20.1 Problem Description
20.2 Model Formulation
20.3 Continuous Time Markov Decision Process with a Drift
20.4 Exponentially Uniformized Markov Decision Process
20.5 Discrete Time and Discrete Space Markov Decision Problem
20.6 Standard Markov Decision Process
20.7 Fixed Assignment Policies
20.7.1 Always Assign Queries to the DB
20.7.2 Always Assign Queries to the WSN
20.8 Numerical Results
20.8.1 Performance of Fixed Policies vs. Optimal Policy
20.8.2 Optimal Policy Under Different Values of the Uniformization Parameter
20.9 Conclusion
Appendices
References
Part VI Financial Modeling
21 Optimal Portfolios and Pricing of Financial Derivatives Under Proportional Transaction Costs
21.1 Introduction
21.2 The Financial Model
21.3 The Markov Decision Model
21.4 Martingale Properties of the Optimal Markov Decision Process
21.5 Price Systems and the Numeraire Portfolio
21.6 Conclusive Remarks
Appendices
References
Appendix A: Basic Notation for MDP
Appendix B: Dichotomy and Criteria