Share Course Ware
Engineering > Mechanical Engineering > Decision Making in Large Scale Systems
 Decision Making in Large Scale Systems  posted by  member7_php   on 2/17/2009  Add Courseware to favorites Add To Favorites  
Further Reading
More Options

De Farias, Daniela Pucci, 2.997 Decision Making in Large Scale Systems, Spring 2004. (Massachusetts Institute of Technology: MIT OpenCourseWare), (Accessed 07 Jul, 2010). License: Creative Commons BY-NC-SA

Decision Making in Large Scale Systems

Spring 2004

Tetris game strategies can be analyzed using the value function approximation techniques described in this course -- see lecture session 10.  (Illustration courtesy of OCW.)

Course Highlights

This course features extensive lecture notes and readings, plus selected examples of student projects.

Course Description

This course is an introduction to the theory and application of large-scale dynamic programming. Topics include Markov decision processes, dynamic programming algorithms, simulation-based algorithms, theory and algorithms for value function approximation, and policy search methods. The course examines games and applications in areas such as dynamic resource allocation, finance and queueing networks.

Technical Requirements

Postscript viewer software, such as  Ghostscript/Ghostview, can be used to view the .ps files found on this course site.


Topic Coverage

Topic coverage will be adapted according to students' interests. Some or all of the following will be covered:

Markov Decision Processes and Dynamic Programming (2-3 weeks)

  • Stochastic Models, Dynamic Programming Theory, Value and Policy Iteration.

Simulation-Based Methods (2 weeks)

  • Asynchronous Value and Policy Iteration, Q-Learning, Complexity of Reinforcement Learning. Revision of underlying tools such as Lyapunov Function Analysis and the ODE Approach.

Value Function Approximation (4 weeks)

  • TD-Learning, Approximate Linear Programming, Performance Bounds, Theory of Function Approximation.

Policy Search Methods (2-3 weeks)

  • Policy Gradient and Actor-Critic Methods. Complexity of Policy Search.

Online Learning and Games (2 weeks)

  • Experts Algorithms, Regret Minimization and Calibration.

We will see applications throughout the course, including dynamic resource allocation, finance and queuing networks, among others.



Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.

Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.

Individual Papers are also used for many class sessions, as listed in the readings section.

Grading Policy

Weekly/Bi-weekly Problem Sets with 2 or 3 questions each 40%
Final Project 60%

Term Project

Students will be offered the option of working on theory, algorithms and/or applications. Project proposals will be submitted midway through the term, with the final project due at the end of the term.

A 10-15 page project report and 15-20 minute presentation are required.


1 Markov Decision Processes
Finite-Horizon Problems: Backwards Induction
Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation
2 Value Iteration
Existence and Uniqueness of Bellman's Equation Solution
Gauss-Seidel Value Iteration
3 Optimality of Policies derived from the Cost-to-go Function
Policy Iteration
Asynchronous Policy Iteration
Problem set 1 out
4 Average-Cost Problems
Relationship with Discounted-Cost Problems
Bellman's Equation
Blackwell Optimality
Problem set 1 due
5 Average-Cost Problems
Computational Methods
6 Application of Value Iteration to Optimization of Multiclass Queueing Networks
Introduction to Simulation-based Methods Real-Time Value Iteration
Problem set 2 out
7 Q-Learning
Stochastic Approximations
8 Stochastic Approximations: Lyapunov Function Analysis
The ODE Method
Convergence of Q-Learning
9 Exploration versus Exploitation: The Complexity of Reinforcement Learning  
10 Introduction to Value Function Approximation
Curse of Dimensionality
Approximation Architectures
11 Model Selection and Complexity Problem set 3 out
12 Introduction to Value Function Approximation Algorithms
Performance Bounds
13 Temporal-Difference Learning with Value Function Approximation  
14 Temporal-Difference Learning with Value Function Approximation (cont.)  
15 Temporal-Difference Learning with Value Function Approximation (cont.)
Optimal Stopping Problems
General Control Problems
16 Approximate Linear Programming Problem set 4 out
17 Approximate Linear Programming (cont.)  
18 Efficient Solutions for Approximate Linear Programming  
19 Efficient Solutions for Approximate Linear Programming: Factored MDPs  
20 Policy Search Methods Problem set 5 out
21 Policy Search Methods (cont.)  
22 Policy Search Methods for POMDPs
Application: Call Admission Control
Actor-Critic Methods
23 Guest Lecture: Prof. Nick Roy
Approximate POMDP Compression
24 Policy Search Methods: PEGASUS
Application: Helicopter Control   Tell A Friend