Random Sampling Without Replacement (Finite "n" Correction)

Steve Brunton Steve Brunton Aug 20, 2025

Audio Brief

Show transcript
This episode details the mathematical derivation for the variance of the sample mean when sampling without replacement from a finite population. There are four key takeaways from this discussion. First, finite populations require a variance correction for the sample mean, as samples are not truly independent. Second, the standard variance formula (sigma squared over n) is an approximation, only valid for infinite populations. Third, the exact formula incorporates a "finite population correction" factor. Fourth, this refined approach is vital for accurate statistical inference in real-world scenarios. When sampling without replacement, each selection changes the remaining population. This creates a small, non-zero covariance between individual samples, violating the independence assumption of simpler models. The common formula, sigma squared over n, simplifies variance calculation but holds true only for infinitely large populations or sampling with replacement. It often overestimates variance in finite population contexts. The finite population correction factor adjusts the variance based on the ratio of sample size to population size. This modifies the standard formula, becoming particularly significant when the sample constitutes a substantial portion of the population. Applying the finite population correction factor ensures more precise estimates of population parameters. This is crucial for fields like polling or quality control, where samples represent a significant fraction of the finite total. Understanding this correction is essential for precise statistical analysis in practical applications.

Episode Overview

  • This episode provides a detailed mathematical derivation for the variance of the sample mean when sampling without replacement from a finite population of size N.
  • It builds upon a previous lecture where the variance was approximated as σ²/n, a formula that only holds for infinitely large populations.
  • The core insight is that sampling without replacement introduces a small, non-zero covariance between individual samples, which must be accounted for in the exact formula.
  • The derivation results in the inclusion of a "finite population correction" factor, which adjusts the variance based on the ratio of sample size to population size.

Key Concepts

  • Random Sampling: The process of selecting a subset of individuals from a population to estimate characteristics of the whole population.
  • Sampling Without Replacement: A sampling method where once an individual is selected from the population, they are not returned.
  • Sample Mean (x̄): The average of the observations in a sample. It is a random variable whose properties (like variance) are crucial for statistical inference.
  • Finite Population Correction (FPC): A factor used to adjust the variance of a sample mean when the sample size (n) is a significant fraction of the finite population size (N).
  • Covariance: A measure of the joint variability of two random variables. In this context, the individual samples (Xᵢ, Xⱼ) are not independent and have a non-zero covariance when sampling without replacement.

Quotes

  • At 00:54 - "Last time I mentioned that this formula for the variance of x bar being sigma squared over little n is an approximation for very, very large populations when big N, the size of my population, is really large." - The speaker sets up the central problem for the lecture: moving from an approximation to an exact formula for finite populations.
  • At 02:11 - "That's only true if each of these Xᵢ's are independent variables. But for a small population... my population actually gets smaller for the next sample... and that introduces a small covariance." - This quote explains the fundamental reason why the simple variance formula is insufficient for finite populations—the samples are not truly independent.
  • At 11:48 - "You can get this nice finite N correction to your variance of x bar... We hope that as little n gets large, the variance of x bar gets small, which is good because that means as our sample gets bigger, x bar, our sample mean, becomes a better and better estimate of the population mean mu." - This connects the complex mathematical derivation back to the practical goal of sampling: achieving a more precise estimate of the population mean by increasing the sample size.

Takeaways

  • When sampling without replacement from a finite population, the variance of the sample mean is not simply σ²/n. A correction factor is required to account for the finite population size.
  • The lack of independence between samples is the key reason for this correction. Each draw from the population affects the probability distribution of subsequent draws.
  • The exact formula for the variance of the sample mean is Var(x̄) = (σ²/n) * [1 - (n-1)/(N-1)]. This formula shows that as the sample size (n) approaches the population size (N), the variance approaches zero.
  • Understanding this correction is crucial for accurate statistical inference in scenarios like polling or quality control where the sample can be a substantial fraction of the total population.