| |
Abstract/Syllabus:
|
De Farias, Daniela Pucci, 2.997 Decision Making in Large Scale Systems, Spring 2004. (Massachusetts Institute of Technology: MIT OpenCourseWare), http://ocw.mit.edu (Accessed 07 Jul, 2010). License: Creative Commons BY-NC-SA
Decision Making in Large Scale Systems
Spring 2004

Tetris game strategies can be analyzed using the value function approximation techniques described in this course -- see lecture session 10. (Illustration courtesy of OCW.)
Course Highlights
This course features extensive lecture notes and readings, plus selected examples of student projects.
Course Description
This course is an introduction to the theory and application of large-scale dynamic programming. Topics include Markov decision processes, dynamic programming algorithms, simulation-based algorithms, theory and algorithms for value function approximation, and policy search methods. The course examines games and applications in areas such as dynamic resource allocation, finance and queueing networks.
Technical Requirements
Postscript viewer software, such as Ghostscript/Ghostview, can be used to view the .ps files found on this course site.
Syllabus
Topic Coverage
Topic coverage will be adapted according to students' interests. Some or all of the following will be covered:
Markov Decision Processes and Dynamic Programming (2-3 weeks)
- Stochastic Models, Dynamic Programming Theory, Value and Policy Iteration.
Simulation-Based Methods (2 weeks)
- Asynchronous Value and Policy Iteration, Q-Learning, Complexity of Reinforcement Learning. Revision of underlying tools such as Lyapunov Function Analysis and the ODE Approach.
Value Function Approximation (4 weeks)
- TD-Learning, Approximate Linear Programming, Performance Bounds, Theory of Function Approximation.
Policy Search Methods (2-3 weeks)
- Policy Gradient and Actor-Critic Methods. Complexity of Policy Search.
Online Learning and Games (2 weeks)
- Experts Algorithms, Regret Minimization and Calibration.
We will see applications throughout the course, including dynamic resource allocation, finance and queuing networks, among others.
Textbooks
Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.
Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.
Individual Papers are also used for many class sessions, as listed in the readings section.
Grading Policy
ACTIVITIES |
PERCENTAGES |
Weekly/Bi-weekly Problem Sets with 2 or 3 questions each |
40% |
Final Project |
60% |
|
Term Project
Students will be offered the option of working on theory, algorithms and/or applications. Project proposals will be submitted midway through the term, with the final project due at the end of the term.
A 10-15 page project report and 15-20 minute presentation are required.
Calendar
LEC # |
TOPICS |
KEY DATES |
1 |
Markov Decision Processes
Finite-Horizon Problems: Backwards Induction
Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation |
|
2 |
Value Iteration
Existence and Uniqueness of Bellman's Equation Solution
Gauss-Seidel Value Iteration |
|
3 |
Optimality of Policies derived from the Cost-to-go Function
Policy Iteration
Asynchronous Policy Iteration |
Problem set 1 out |
4 |
Average-Cost Problems
Relationship with Discounted-Cost Problems
Bellman's Equation
Blackwell Optimality |
Problem set 1 due |
5 |
Average-Cost Problems
Computational Methods |
|
6 |
Application of Value Iteration to Optimization of Multiclass Queueing Networks
Introduction to Simulation-based Methods Real-Time Value Iteration |
Problem set 2 out |
7 |
Q-Learning
Stochastic Approximations |
|
8 |
Stochastic Approximations: Lyapunov Function Analysis
The ODE Method
Convergence of Q-Learning |
|
9 |
Exploration versus Exploitation: The Complexity of Reinforcement Learning |
|
10 |
Introduction to Value Function Approximation
Curse of Dimensionality
Approximation Architectures |
|
11 |
Model Selection and Complexity |
Problem set 3 out |
12 |
Introduction to Value Function Approximation Algorithms
Performance Bounds |
|
13 |
Temporal-Difference Learning with Value Function Approximation |
|
14 |
Temporal-Difference Learning with Value Function Approximation (cont.) |
|
15 |
Temporal-Difference Learning with Value Function Approximation (cont.)
Optimal Stopping Problems
General Control Problems |
|
16 |
Approximate Linear Programming |
Problem set 4 out |
17 |
Approximate Linear Programming (cont.) |
|
18 |
Efficient Solutions for Approximate Linear Programming |
|
19 |
Efficient Solutions for Approximate Linear Programming: Factored MDPs |
|
20 |
Policy Search Methods |
Problem set 5 out |
21 |
Policy Search Methods (cont.) |
|
22 |
Policy Search Methods for POMDPs
Application: Call Admission Control
Actor-Critic Methods |
|
23 |
Guest Lecture: Prof. Nick Roy
Approximate POMDP Compression |
|
24 |
Policy Search Methods: PEGASUS
Application: Helicopter Control |
|
|
|
|
Further Reading:
|
Readings
Bertsekas = Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.
Bertsekas and Tsitsiklis = Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.
LEC # |
TOPICS |
READINGS |
1 |
Markov Decision Processes
Finite-Horizon Problems: Backwards Induction
Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation |
Bertsekas Vol. 1, Chapter 1. |
2 |
Value Iteration
Existence and Uniqueness of Bellman's Equation Solution
Gauss-Seidel Value Iteration |
Bertsekas Vol. 2, Chapter 1.
Bertsekas and Tsitsiklis, Chapter 2. |
3 |
Optimality of Policies derived from the Cost-to-Go Function
Policy Iteration
Asynchronous Policy Iteration |
Bertsekas Vol. 2, Chapter 1.
Bertsekas and Tsitsiklis, Chapter 2. |
4 |
Average-Cost Problems
Relationship with Discounted-Cost Problems
Bellman's Equation
Blackwell Optimality |
Bertsekas Vol. 2, Chapter 4. |
5 |
Average-Cost Problems
Computational Methods |
Bertsekas Vol. 2, Chapter 4. |
6 |
Application of Value Iteration to Optimization of Multiclass Queueing Networks
Introduction to Simulation-based Methods Real-Time Value Iteration |
Chen, R. R., and S. P. Meyn. "Value Iteration and Optimization of Multiclass Queueing Networks."Queueing Systems 32 (1999): 65-97.
Bertsekas and Tsitsiklis, Chapter 5. |
7 |
Q-Learning
Stochastic Approximations |
Bertsekas and Tsitsiklis, Chapters 4 and 5. |
8 |
Stochastic Approximations: Lyapunov Function Analysis
The ODE Method
Convergence of Q-Learning |
Bertsekas and Tsitsiklis, Chapters 4 and 5. |
9 |
Exploration versus Exploitation: The Complexity of Reinforcement Learning |
Kearns, M. , and S. Singh. "Near-Optional Reinforcement Learning in Polynomial Time." Machine Learning 49, no. 2 (Nov 2002): 209-232. |
10 |
Introduction to Value Function Approximation
Curse of Dimensionality
Approximation Architectures |
Bertsekas and Tsitsiklis, Chapter 6. |
11 |
Model Selection and Complexity |
Hastie, Tibshirani, and Friedmann. Chapter 7 in The Elements of Statistical Learning. New York: Springer, 2003. ISBN: 9780387952840. |
12 |
Introduction to Value Function Approximation Algorithms
Performance Bounds |
Bertsekas and Tsitsiklis, Chapter 6. |
13 |
Temporal-Difference Learning with Value Function Approximation |
Bertsekas and Tsitsiklis, Chapter 6. |
14 |
Temporal-Difference Learning with Value Function Approximation (cont.) |
Bertsekas and Tsitsiklis, Chapter 6.
de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning." |
15 |
Temporal-Difference Learning with Value Function Approximation (cont.)
Optimal Stopping Problems
General Control Problems |
Bertsekas and Tsitsiklis, Chapter 6.
de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning."
Bertsekas, Borkar, and Nedic. "Improved temporal Difference Methods with Linear Function Approximation." |
16 |
Approximate Linear Programming |
de Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming." |
17 |
Approximate Linear Programming (cont.) |
de Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming." |
18 |
Efficient Solutions for Approximate Linear Programming |
de Farias D. P., and B. Van Roy. "On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming."
Calafiori, and Campi. "Uncertain Convex Programs: Randomized Solutions and Confidence Levels." |
19 |
Efficient Solutions for Approximate Linear Programming: Factored MDPs |
Guestrin, et al. "Efficient Solution Algorithms for Factored MDPs."
Schuurmans, and Patrascu. "Direct Value Approximation for Factored MDPs." |
20 |
Policy Search Methods |
Marbach, and Tsitsiklis. "Simulation-Based Optimization of Markov Reward Processes." |
21 |
Policy Search Methods (cont.) |
Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation." |
22 |
Policy Search Methods for POMDPs
Application: Call Admission Control
Actor-Critic Methods |
Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation."
Baxter, and Bartlett. "Experiments with Infinite-Horizon Policy-Gradient Estimation."
Konda, and Tsitsiklis. "Actor-Critic Algorithms."
|
23 |
Guest Lecture: Prof. Nick Roy
Approximate POMDP Compression |
Roy, and Gordon. "Exponential Family PCA for Belief Compression in POMDPs." |
24 |
Policy Search Methods: PEGASUS
Application: Helicopter Control |
Ng, and Jordan. "PEGASUS: A policy search method for large MDPs and POMDPs."
Ng, et al. "Autonomous Helicopter Flight via Reinforcement Learning." |
Complementary Reading
Even-Dar, and Mansour. "Learning Rates for Q-Learning.'' Journal of Machine Learning Research 5 (2003): 1-25.
Barron. "Universal approximation bounds for superpositions of a sigmoidal function." IEEE Transactions on Information Theory 39 (1993): 930-944.
Tesauro. "Temporal-Difference Learning and TD-Gammon'' Communications of the ACM 38, no. 3 (1995).
|
|
|
Rating:
0 user(s) have rated this courseware
Views:
14040
|
|
|
|
|