Sample Variance in Random Population Sampling

Steve Brunton Steve Brunton Aug 21, 2025

Audio Brief

Show transcript
This episode explores the sample variance, determining its expected value and demonstrating how to derive an unbiased estimator for the population variance. There are three key takeaways from this discussion. First, the standard sample variance, calculated by dividing by n, is a biased estimator that systematically underestimates the true population variance. Second, an unbiased estimator for population variance can be constructed using a specific correction factor. Third, the sample variance is essential for estimating the precision of the sample mean, even without knowing the true population variance. The derivation of the sample variance's expected value reveals it does not equal the population variance. This bias means the standard sample variance formula, commonly used, consistently falls short of the actual population spread. To correct this bias, a factor of n over n minus one is applied to the standard sample variance. This adjustment creates a new statistic whose expected value precisely matches the true population variance. Understanding sample variance is key to estimating the variance of the sample mean. This allows for quantifying the reliability and precision of the sample mean as an approximation of the population mean, relying solely on sample data. These insights provide a foundational understanding of sample variance, its estimation, and its crucial role in statistical inference.

Episode Overview

  • This episode transitions from analyzing the sample mean to a deep dive into the sample variance (σ̂²).
  • The primary goal is to determine what the sample variance estimates by calculating its expected value, E(σ̂²).
  • The derivation reveals that the standard sample variance is a biased estimator of the true population variance (σ²).
  • The video concludes by showing how to construct an unbiased estimator for the population variance and how to use the sample variance to estimate the variance of the sample mean.

Key Concepts

  • Sample Variance (σ̂²): A statistic calculated from a sample, defined as σ̂² = (1/n)Σ(xᵢ - x̄)². The lecture uses an equivalent formula, σ̂² = (1/n)Σ(xᵢ²) - x̄², for the derivation.
  • Expected Value of Sample Variance: The core of the analysis is the derivation of E(σ̂²). The result shows that E(σ̂²) is not equal to the population variance σ², meaning the sample variance is a biased estimator.
  • Biased Estimator: An estimator whose expected value is not equal to the population parameter it is intended to estimate. The sample variance, as defined with division by n, systematically underestimates the true population variance.
  • Unbiased Estimator of Population Variance: By applying a correction factor, (n/(n-1)) * ((N-1)/N), to the sample variance σ̂², we can create a new statistic that is an unbiased estimator, meaning its expected value is exactly σ².
  • Variance of a Random Variable: The key identity Var(X) = E(X²) - [E(X)]² is rearranged to E(X²) = Var(X) + [E(X)]², which is used repeatedly to simplify terms in the derivation.
  • Estimating the Variance of the Sample Mean: The variance of the sample mean, Var(x̄), depends on the unknown population variance σ². The lecture demonstrates that we can create a practical estimate for Var(x̄) by substituting our unbiased estimate of σ² into its formula, using only information available from the sample.

Quotes

  • At 00:38 - "But we haven't looked very much at the sample variance, sigma hat squared. We don't really know where this comes into play." - The speaker highlights the shift in focus from the sample mean to the sample variance, which is the central topic of the lecture.
  • At 01:17 - "What is it an unbiased estimate of? What is it, what is it estimating?" - This quote poses the primary question that the rest of the mathematical derivation aims to answer regarding the sample variance.
  • At 07:11 - "The expected value of my sample variance is not my population variance... So, my sample variance is a biased estimate of the population variance." - This is the key conclusion derived from the detailed mathematical proof, establishing the biased nature of the sample variance.

Takeaways

  • The standard formula for sample variance (dividing by n) provides a biased estimate of the true population variance; on average, it will be slightly smaller than the actual value.
  • To obtain an unbiased estimate of the population variance, the sample variance must be multiplied by a correction factor that accounts for both the sample size (n) and the population size (N).
  • The sample variance is a critical tool for estimating the variance of the sample mean (Var(x̄)). This allows us to quantify the precision of our sample mean as an estimate of the true population mean, even without knowing the true population variance.
  • The identity E(X²) = Var(X) + [E(X)]² is a fundamental and powerful relationship used to manipulate and solve complex expressions involving expected values in statistical analysis.