Instructor
Brian Caffo
Offered By
Biostatistics
Description
Presents fundamental concepts in applied probability, exploratory data analysis, and statistical inference, focusing on probability and analysis of one and two samples. Topics include discrete and continuous probability models; expectation and variance; central limit theorem; inference, including hypothesis testing and confidence for means, proportions, and counts; maximum likelihood estimation; sample size determinations; elementary nonparametric methods; graphical displays; and data transformations.
Learning Objectives
The goal of this course is to equip biostatistics and quantitative scientists with core applied statistical concepts and methods:
1) The course will refresh the mathematical, computational, statistical and probability background that students will need to take the course.
2) The course will introduce students to the display and communication of statistical data. This will include graphical and exploratory data analysis using tools like scatterplots, boxplots and the display of multivariate data. In this objective, students will be required to write extensively.
3) Students will learn the distinctions between the fundamental paradigms underlying statistical methodology.
4) Students will learn the basics of maximum likelihood.
5) Students will learn the basics of frequentist methods: hypothesis testing, confidence intervals.
6) Students will learn basic Bayesian techniques, interpretation and prior specification.
7) Students will learn the creation and interpretation of P values.
8) Students will learn estimation, testing and interpretation for single group summaries such as means, medians, variances, correlations and rates.
9) Students will learn estimation, testing and interpretation for two group comparisons such as odds ratios, relative risks and risk differences.
10) Students will learn the basic concepts of ANOVA.
Syllabus
Course Description
Presents fundamental concepts in applied probability, exploratory data analysis, and statistical inference, focusing on probability and analysis of one and two samples. Topics include discrete and continuous probability models; expectation and variance; central limit theorem; inference, including hypothesis testing and confidence for means, proportions, and counts; maximum likelihood estimation; sample size determinations; elementary nonparametric methods; graphical displays; and data transformations.
Course Objectives
The goal of this course is to equip biostatistics and quantitative scientists with core applied statistical concepts and methods:
1) The course will refresh the mathematical, computational, statistical and probability background that students will need to take the course.
2) The course will introduce students to the display and communication of statistical data. This will include graphical and exploratory data analysis using tools like scatterplots, boxplots and the display of multivariate data. In this objective, students will be required to write extensively.
3) Students will learn the distinctions between the fundamental paradigms underlying statistical methodology.
4) Students will learn the basics of maximum likelihood.
5) Students will learn the basics of frequentist methods: hypothesis testing, confidence intervals.
6) Students will learn basic Bayesian techniques, interpretation and prior specification.
7) Students will learn the creation and interpretation of P values.
8) Students will learn estimation, testing and interpretation for single group summaries such as means, medians, variances, correlations and rates.
9) Students will learn estimation, testing and interpretation for two group comparisons such as odds ratios, relative risks and risk differences.
10) Students will learn the basic concepts of ANOVA.
Prerequisites
Calculus, linear algebra and a moderate level of mathematical literacy are prerequisites for this class. Note that simply having the prerequisites for this class does not necessarily mean that it is the correct class for you. For example, a student with a PhD in theoretical mathematics who would like a broad overview of biostatistics and immediately applicable techniques would be better off in the 620 series.
Readings
Mathematical Statistics and Data Analysis, 2nd Edition by John
A. Rice. Duxbury Press.
Schedule

1 
Set Theory Basics and Probability
1. Cover syllabus
2. Abstract the idea of an experiment
3. Develop basic set theory to be used in the development of probability
4. Start discussing probability

Read Rosner Chapt 1

2 
Introduction to Probability
1. Define probability calculus
2. Basic axioms of probability
3. Define random variables
4. Define density and mass functions
5. Define cumulative distribution functions and survivor
6. Define quantiles, percentiles, and medians

Read Rosner 3.13.5, 4.14.3, and 5.15.2

3 
Expected Values
1. Define expected values
2. Properties of expected values
3. Unbiasedness of the sample mean
4. Define variances
5. Define the standard deviation
6. Calculate Bernoulli variance

Read Rosner 4.44.5 and 4.9

4 
Random Vectors, Independence
1. Define random vectors
2. Independent events and variables
3. IID random variables
4. Covariance and correlation
5. Standard error of the mean
6. Unbiasedness of the sample variance

Read Rosner 3.4

5 
Conditional Probabilities, Baye's Rule
1. Define conditional probabilites
2. Define conditional mass functions and densities
3.Motivate the conditional density
4. Baye's rule
5. Applications of Baye's rule to diagnostic testing

Read Rosner 3.63.9

6 
Likelihood
1. Define likelihood
2. Interpretations of likelihoods
3. Likelihood plots
4.Maximum likelihood
5. Likelihood ratio benchmarks


7 
Distributions
1. Define the Bernoulli distrubtion
2. Define Bernoulli likelihoods
3. Define the Binomial distribution
4. Define Binomial likelihoods
5. Define the normal distribution
6. Define normal likelihoods

Read Rosner 4.8, 4.9, and 5.15.6

8 
Asymptotics
1. Define convergent series
2. Definte the Law of Large Numbers
3. Define the Central Limit Theorem
4. Create Wald confidence intervals using the CLT

Read Rosner 6.1, 6.2, and 6.5

9 
Confidence Intervals
1. Define the Chisquared and t distributions
2. Derive confidence intervals for the variance
3. Illustrate the likelihood for the variance
4. Derive t confidence intervals for the mean
5. Derive the likelihood for the effect size

Read Rosner 6.7 and 6.8

10 
Confidence Intervals
1. Introduce independent group t confidence intervals
2. Define the pooled variance estimate
3. Derive the distribution for the independent group,
common variance, statistic
4. Cover likelihood methods for the change in the group
means per standard deviation
5. Discuss remedies for unequal variances

Read Rosner 8.3 and 8.5

11 
Presentation of Data
1. Histograms
2. Stemandleaf plots
3. Dot charts and dot plots
4. Boxplots
5. Kernel density estimates
6. QQplots

Read Rosner Chapter 2

12 
Bootstrapping
1. Introduce the bootstrap principle
2. Outline the bootstrap algorithm
3. Example bootstrap calculations


13 
Confidence Intervals for Binomial Proportions
1. Confidence intervals for binomial proportions
2. Discuss problems with the Wald interval
3. Introduce Bayesian analysis
4. HPD intervals
5. Confidence interval interpretation

Read Rosner 6.8

14 
Logs and Geometric Means
1. Review about logs
2. Introduce the geometric mean
3. Interpretations of the geometric mean
4. Confidence intervals for the geometric mean
5. Lognormal distribution
6. Lognormal based intervals

