Random Sampling in Statistics: Expected Value and Variance of the Sample Mean

Steve Brunton Steve Brunton Aug 19, 2025

Audio Brief

Show transcript
This episode explores the statistical properties of the sample mean, focusing on its role as an unbiased estimator for the true population mean and the factors determining its precision. There are four key takeaways from this discussion. First, the sample mean is a reliable and unbiased estimator of the true population mean. This means its expected value precisely equals the population mean, ensuring it does not systematically over or underestimate the actual value on average. Second, the precision of the sample mean as an estimate improves significantly with increased sample size. Its variance decreases proportionally to one over n, indicating that larger samples yield a much tighter and more accurate estimation of the population mean. Third, the Central Limit Theorem provides a powerful approximation: for sufficiently large sample sizes, the distribution of the sample mean approaches a normal distribution. This distribution is centered directly at the true population mean. Finally, when sampling without replacement from a finite population, a finite population correction factor becomes crucial for accurate variance calculations. This factor accounts for the inherent dependency among samples drawn from a limited pool, where strict independence cannot be assumed. These insights are fundamental for understanding the robustness and limitations of statistical inference using sample means.

Episode Overview

  • The lecture explores the statistical properties of the sample mean when performing random sampling from a large, unknown population.
  • It poses fundamental questions about whether the sample mean is a biased or unbiased estimator of the true population mean and how to quantify its accuracy.
  • The speaker derives the expectation of the sample mean, proving that it is an unbiased estimator of the population mean (E[x̄] = μ).
  • The concept of the variance of the sample mean is introduced, explaining its relationship to the precision of the estimate and connecting it to the Central Limit Theorem.

Key Concepts

  • Random Sampling: The process of selecting a subset (sample) from a larger set (population) where each potential sample has an equal probability of being chosen.
  • Sample Mean (x̄): The average of the observations in a sample, which acts as a random variable and is used to estimate the true population mean (μ).
  • Unbiased Estimator: An estimator whose expected value is equal to the true value of the parameter being estimated. The lecture proves that the sample mean is an unbiased estimator of the population mean.
  • Variance of the Sample Mean (Var(x̄)): A measure of the spread or dispersion of the distribution of the sample mean. A smaller variance indicates a more precise estimate of the population mean.
  • Central Limit Theorem (CLT) Result: For large populations where samples can be considered independent, the variance of the sample mean is approximately the population variance divided by the sample size (σ²/n).
  • Finite Population Correction: A correction factor applied to the variance calculation when sampling without replacement from a finite population to account for the dependency between drawn samples.

Quotes

  • At 01:40 - "What is the variance of x-bar? Meaning, we have a pretty good gut feeling that this sample mean should be Gaussian distributed for reasonably large n... But what's the variance of that distribution? Is it a fat distribution, is it skinny? We want the variance of x-bar to be really, really small because that means x-bar is a really, really tight estimate of mu." - Explaining the importance of understanding the variance to determine the quality and precision of the sample mean as an estimator.
  • At 03:22 - "We want to show that the expectation value of x-bar equals mu, meaning that x-bar is an unbiased estimate of mu." - Stating the primary goal of the first derivation, which is to establish the fundamental property that the sample mean does not systematically over or underestimate the true population mean.
  • At 10:07 - "Technically, there is joint covariance between these variables and so this step is actually not exactly true... This is true if my X_i's are independent, and that's true for very, very large population size N." - Highlighting the critical nuance that sampling without replacement violates the strict independence assumption, which necessitates a correction factor for the variance in finite populations.

Takeaways

  • The sample mean (x̄) is a reliable, unbiased estimator for the true population mean (μ), meaning that on average, it will correctly estimate the true value.
  • The precision of the sample mean as an estimate improves as the sample size (n) increases, because its variance decreases in proportion to 1/n.
  • The Central Limit Theorem provides a powerful and often-used approximation that the distribution of the sample mean is normal, centered at the true population mean.
  • When sampling without replacement from a finite population, a "finite population correction" factor must be applied to the variance calculation to account for the fact that the samples are not perfectly independent.