| |
Abstract/Syllabus:
|
Essentials of Probability and Statistical Inference IV: Algorithmic and NonParametic Approaches
Spring 2006
Instructor
Rafael Irizarry
Offered By
Biostatistics
Description
Introduces the theory and application of modern, computationally-based methods for exploring and drawing inferences from data. Covers re-sampling methods, non-parametric regression, prediction, and dimension reduction and clustering. Specific topics include Monte Carlo simulation, bootstrap cross-validation, splines, local weighted regression, CART, random forests, neural networks, support vector machines, and hierarchical clustering. De-emphasizes proofs and replaces them with extended discussion of interpretation of results and simulation and data analysis for illustration.
Syllabus
Course Description
Introduces the theory and application of modern, computationally-based methods for exploring and drawing inferences from data. Covers re-sampling methods, non-parametric regression, prediction, and dimension reduction and clustering. Specific topics include Monte Carlo simulation, bootstrap cross-validation, splines, local weighted regression, CART, random forests, neural networks, support vector machines, and hierarchical clustering. De-emphasizes proofs and replaces them with extended discussion of interpretation of results and simulation and data analysis for illustration.
Course Objectives
After completing this course, a student will be able to understand the theoretical basis for the current methods used in statistical analysis.
Prerequisites
140.646-648 or 140.611-12 or 140.621-24 or 140.651-54 or 140.671-74; working knowledge of calculus
Readings
- T. Hastie, R. Tibshirani, and J. H. Fried. (2001) The Elements of Statistical Learning. Springer-Verlag: New York.
- Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S-Plus. Springer-Verlag: New York.
- Brian D. Ripley. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
Course Requirements
Method of student evaluation based on homeworks, quizzes, and a final project.
Schedule
N/A |
Review |
Lecture: Stuff you should know: Basics of probability, the central limit theorem, and inference. |
1 |
Introduction to Regression and Prediction |
Lecture: We will describe linear regression in the context of a prediction problem. |
2 |
Overview of Supervised Learning |
Lecture: Regression for predicting bivariate data, K nearest neighbors (KNN), bin smoothers, and an introduction to the bias/variance trade-off. |
3 |
Linear Methods for Regression |
Lecture: Subset selection and ridge regression. We will use singular value decomposition (SVD) and principal component analysis (PCA) to understand these methods. |
4 |
Linear Methods for Regression |
Lecture: Subset selection and ridge regression. We will use singular value decomposition (SVD) and principal component analysis (PCA) to understand these methods. |
5 |
Linear Methods for Classification |
Lecture: Linear Regression, Linear Discriminant Analysis (LDA), and Logisitc Regression |
6 |
Kernel Methods |
Lecture: Kernel smoothers including loess. We will briefly describe 2 dimensional smoothers. We will also define degrees of freedom in the context of smoothing and learn about density estimators. |
7 |
Model Assessment and Selection |
Lecture: We revist the bias-variance tradeoff. We describe how monte-carlo simulations can be used to assess bias and variance. We then introduce cross-validation, AIC, and BIC. |
8 |
The Bootstrap |
Lecture: We give a short introduction to the bootstrap and demonstrate its utility in smoothing problems. |
9 |
Splines, Wavelets, and Friends |
Lecture: We give intuitive and mathematical description of Splines and Wavelets. We use the SVD to understand these better and see connections with signal processing methods. |
10 |
Splines, Wavelets, and Friends |
Lecture: We give intuitive and mathematical description of Splines and Wavelets. We use the SVD to understand these better and see connections with signal processing methods. |
11 |
Additive Models, GAM and Neural Networks |
Lecture: We move back to cases with many covariates. We introduce projection pursuit, additive models as well as generalized additive models. We breifly describe neural networks and explain the connection to projection pursuit. |
12 |
Additive Models, GAM and Neural Networks |
Lecture: We move back to cases with many covariates. We introduce projection pursuit, additive models as well as generalized additive models. We breifly describe neural networks and explain the connection to projection pursuit. |
13 |
Model Averaging |
Lecture: Bayesian Statistics, Boosting and Bagging. |
14 |
CART, Boosting and Additive Trees |
Lecture: We introduce classification algorithms and regression trees (CART) as well as the more modern versions such as random forrests. |
15 |
CART, Boosting and Additive Trees |
Lecture: We introduce classification algorithms and regression trees (CART) as well as the more modern versions such as random forrests. |
16 |
Clustering Algorithms |
Lecture |
|
|
|
|
Rating:
0 user(s) have rated this courseware
Views:
16716
|
|
|
|
|