 
Abstract/Syllabus:

Essentials of Probability and Statistical Inference IV: Algorithmic and NonParametic Approaches
Spring 2006
Instructor
Rafael Irizarry
Offered By
Biostatistics
Description
Introduces the theory and application of modern, computationallybased methods for exploring and drawing inferences from data. Covers resampling methods, nonparametric regression, prediction, and dimension reduction and clustering. Specific topics include Monte Carlo simulation, bootstrap crossvalidation, splines, local weighted regression, CART, random forests, neural networks, support vector machines, and hierarchical clustering. Deemphasizes proofs and replaces them with extended discussion of interpretation of results and simulation and data analysis for illustration.
Syllabus
Course Description
Introduces the theory and application of modern, computationallybased methods for exploring and drawing inferences from data. Covers resampling methods, nonparametric regression, prediction, and dimension reduction and clustering. Specific topics include Monte Carlo simulation, bootstrap crossvalidation, splines, local weighted regression, CART, random forests, neural networks, support vector machines, and hierarchical clustering. Deemphasizes proofs and replaces them with extended discussion of interpretation of results and simulation and data analysis for illustration.
Course Objectives
After completing this course, a student will be able to understand the theoretical basis for the current methods used in statistical analysis.
Prerequisites
140.646648 or 140.61112 or 140.62124 or 140.65154 or 140.67174; working knowledge of calculus
Readings
 T. Hastie, R. Tibshirani, and J. H. Fried. (2001) The Elements of Statistical Learning. SpringerVerlag: New York.
 Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with SPlus. SpringerVerlag: New York.
 Brian D. Ripley. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
Course Requirements
Method of student evaluation based on homeworks, quizzes, and a final project.
Schedule
N/A 
Review 
Lecture: Stuff you should know: Basics of probability, the central limit theorem, and inference. 
1 
Introduction to Regression and Prediction 
Lecture: We will describe linear regression in the context of a prediction problem. 
2 
Overview of Supervised Learning 
Lecture: Regression for predicting bivariate data, K nearest neighbors (KNN), bin smoothers, and an introduction to the bias/variance tradeoff. 
3 
Linear Methods for Regression 
Lecture: Subset selection and ridge regression. We will use singular value decomposition (SVD) and principal component analysis (PCA) to understand these methods. 
4 
Linear Methods for Regression 
Lecture: Subset selection and ridge regression. We will use singular value decomposition (SVD) and principal component analysis (PCA) to understand these methods. 
5 
Linear Methods for Classification 
Lecture: Linear Regression, Linear Discriminant Analysis (LDA), and Logisitc Regression 
6 
Kernel Methods 
Lecture: Kernel smoothers including loess. We will briefly describe 2 dimensional smoothers. We will also define degrees of freedom in the context of smoothing and learn about density estimators. 
7 
Model Assessment and Selection 
Lecture: We revist the biasvariance tradeoff. We describe how montecarlo simulations can be used to assess bias and variance. We then introduce crossvalidation, AIC, and BIC. 
8 
The Bootstrap 
Lecture: We give a short introduction to the bootstrap and demonstrate its utility in smoothing problems. 
9 
Splines, Wavelets, and Friends 
Lecture: We give intuitive and mathematical description of Splines and Wavelets. We use the SVD to understand these better and see connections with signal processing methods. 
10 
Splines, Wavelets, and Friends 
Lecture: We give intuitive and mathematical description of Splines and Wavelets. We use the SVD to understand these better and see connections with signal processing methods. 
11 
Additive Models, GAM and Neural Networks 
Lecture: We move back to cases with many covariates. We introduce projection pursuit, additive models as well as generalized additive models. We breifly describe neural networks and explain the connection to projection pursuit. 
12 
Additive Models, GAM and Neural Networks 
Lecture: We move back to cases with many covariates. We introduce projection pursuit, additive models as well as generalized additive models. We breifly describe neural networks and explain the connection to projection pursuit. 
13 
Model Averaging 
Lecture: Bayesian Statistics, Boosting and Bagging. 
14 
CART, Boosting and Additive Trees 
Lecture: We introduce classification algorithms and regression trees (CART) as well as the more modern versions such as random forrests. 
15 
CART, Boosting and Additive Trees 
Lecture: We introduce classification algorithms and regression trees (CART) as well as the more modern versions such as random forrests. 
16 
Clustering Algorithms 
Lecture 




Rating:
0 user(s) have rated this courseware
Views:
6942




