颜林林的个人网站

如何理解中心极限定理?

2024-03-02 23:41

正态分布(normal distribution)在统计学中的江湖地位,应该是由中心极限定理(CLT,Central Limit Theorem)确立的。该定理的表述是:对于独立同分布的随机变量,无论其分布如何(即使并非正态分布),其样本均值的抽样分布也会趋向于正态分布。

所谓“纸上得来终觉浅,绝知此事要躬行”,下面就用代码来进行模拟抽样:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Demonstration of the Central Limit Theorem (CLT)

# Set common parameters
n <- 1000        # Number of samples
sample_size <- 30   # Size of each sample

# Prepare a 2x2 plot layout
par(mfrow=c(2,2))

# 1. Uniform Distribution
sample_means_uniform <- replicate(n, mean(runif(sample_size, min=0, max=1)))
hist(sample_means_uniform, breaks=40, probability=TRUE, main="Uniform Distribution", xlab="Sample Mean")
curve(dnorm(x, mean=mean(sample_means_uniform), sd=sd(sample_means_uniform)), add=TRUE, col="red", lwd=2)

# 2. Exponential Distribution
sample_means_exponential <- replicate(n, mean(rexp(sample_size, rate=1)))
hist(sample_means_exponential, breaks=40, probability=TRUE, main="Exponential Distribution", xlab="Sample Mean")
curve(dnorm(x, mean=mean(sample_means_exponential), sd=sd(sample_means_exponential)), add=TRUE, col="red", lwd=2)

# 3. Poisson Distribution
sample_means_poisson <- replicate(n, mean(rpois(sample_size, lambda=2)))
hist(sample_means_poisson, breaks=40, probability=TRUE, main="Poisson Distribution", xlab="Sample Mean")
curve(dnorm(x, mean=mean(sample_means_poisson), sd=sd(sample_means_poisson)), add=TRUE, col="red", lwd=2)

# 4. Binomial Distribution
sample_means_binomial <- replicate(n, mean(rbinom(sample_size, size=10, prob=0.5)))
hist(sample_means_binomial, breaks=40, probability=TRUE, main="Binomial Distribution", xlab="Sample Mean")
curve(dnorm(x, mean=mean(sample_means_binomial), sd=sd(sample_means_binomial)), add=TRUE, col="red", lwd=2)

# Reset plot layout
par(mfrow=c(1,1))

这里选用了四种分布:均匀分布(Uniform Distribution)、指数分布(Exponential Distribution)、泊松分布(Poisson Distribution)和 二项分布(Binomial Distribution)。分别抽样1000次,每次抽取30个符合该分布的实数,计算这30个数的均值,最后对这1000个均值绘制直方图,以展示它们的分布,同时在图上叠加理想的正态分布曲线。

可以看到直方图形状的确“趋向于正态分布”,如下图结果所示:

模拟结果

感兴趣的读者,不妨自己继续修改代码玩玩,比如更换其他奇形怪状的分布,也会看到结果仍然趋于正态分布,这无疑能更加深体会。

--- END ---

注:本文首发表于“不靠谱颜论”公众号,并同步至本站(还是在自建网站上,代码排版灵活方便许多)。

相关文章