import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
def plot_xbar_test(mu0, sigma, n, x_bar, alpha=0.05, kind='left', title=''):
se = sigma / np.sqrt(n)
x_grid = np.linspace(mu0 - 4 * se, mu0 + 4 * se, 1000)
y_grid = norm.pdf(x_grid, loc=mu0, scale=se)
plt.figure(figsize=(8, 4))
plt.plot(x_grid, y_grid, lw=2, label='Sampling density of $\\bar{X}$ under $H_0$')
if kind == 'left':
z_crit = norm.ppf(alpha)
k = mu0 + z_crit * se
mask = x_grid <= k
plt.fill_between(x_grid[mask], y_grid[mask], alpha=0.35, color='tomato', label='Rejection region')
plt.axvline(k, color='tomato', ls='--', lw=2, label=f'K = {k:.3f}')
elif kind == 'right':
z_crit = norm.ppf(1 - alpha)
k = mu0 + z_crit * se
mask = x_grid >= k
plt.fill_between(x_grid[mask], y_grid[mask], alpha=0.35, color='tomato', label='Rejection region')
plt.axvline(k, color='tomato', ls='--', lw=2, label=f'K = {k:.3f}')
elif kind == 'two-sided':
z_half = norm.ppf(1 - alpha / 2)
k_l = mu0 - z_half * se
k_r = mu0 + z_half * se
mask_l = x_grid <= k_l
mask_r = x_grid >= k_r
plt.fill_between(x_grid[mask_l], y_grid[mask_l], alpha=0.35, color='tomato', label='Rejection regions')
plt.fill_between(x_grid[mask_r], y_grid[mask_r], alpha=0.35, color='tomato')
plt.axvline(k_l, color='tomato', ls='--', lw=2, label=f'K_L = {k_l:.3f}')
plt.axvline(k_r, color='tomato', ls='--', lw=2, label=f'K_R = {k_r:.3f}')
else:
raise ValueError("kind must be 'left', 'right', or 'two-sided'")
plt.axvline(x_bar, color='navy', lw=2, label=f'Observed $\\bar{{x}}$ = {x_bar:.3f}')
plt.title(title)
plt.xlabel('$\\bar{X}$')
plt.ylabel('Density')
plt.legend(loc='upper left')
plt.grid(alpha=0.2)
plt.show()Introductory hypothesis test for the mean (known variance)
We consider one giraffe-themed example in three variants: - left-tailed test; - right-tailed test; - two-tailed test.
Common assumptions for the \(Z\)-test for a mean: - observations are independent; - either the population distribution is normal, or the sample size is sufficiently large; - the true variance \(\sigma^2\) is known; - the significance level is fixed at \(\alpha = 0.05\).
In all plots, we show the sampling distribution of the sample mean under the null hypothesis: \[ \bar X \mid H_0 \sim \mathcal N\!\left(\mu_0, \frac{\sigma^2}{n}\right). \]
1) Left-tailed test
Problem setup: - \(X\) is giraffe neck length (meters); - independent sample of size \(n = 20\); - known standard deviation \(\sigma = 0.4\); - observed sample mean \(\bar x = 2.8\); - we test a null mean value \(\mu_0 = 6.0\) at \(\alpha = 0.05\).
Theory: - hypotheses: \(H_0: \mu = \mu_0\) versus \(H_1: \mu < \mu_0\); - under \(H_0\): \[ \bar X \sim \mathcal N\!\left(\mu_0, \frac{\sigma^2}{n}\right); \] - the critical cutoff is defined by \[ P_{H_0}(\bar X < K) = \alpha; \] - standardize and apply a change of variable: \[ P\!\left(\frac{\bar X-\mu_0}{\sigma/\sqrt n} < \frac{K-\mu_0}{\sigma/\sqrt n}\right)=\alpha \Rightarrow P(Z<z_\alpha)=\alpha; \] - note: since \(\alpha\) is a small probability (typically \(\alpha<0.5\)), this quantile is negative; equivalently, write it as \(-\lvert z_\alpha \rvert\); - transform back to the original scale: \[ \frac{K-\mu_0}{\sigma/\sqrt n}=-\lvert z_\alpha \rvert \Rightarrow K=\mu_0-\lvert z_\alpha \rvert\frac{\sigma}{\sqrt n}; \] - decision rule: reject \(H_0\) if \(\bar x < K\).
# Left-tailed test
mu0 = 6.0
sigma = 0.4
n = 20
x_bar = 2.8
alpha = 0.05
z_alpha = norm.ppf(1 - alpha)
K = mu0 - z_alpha * sigma / np.sqrt(n)
print(f"K = {K:.3f}")
print("Decision:", "Reject H0" if x_bar < K else "Do not reject H0")
plot_xbar_test(
mu0=mu0,
sigma=sigma,
n=n,
x_bar=x_bar,
alpha=alpha,
kind='left',
title='Left-tailed test: distribution of $\\bar{X}$ under $H_0$'
)2) Right-tailed test
Problem setup: - \(X\) is giraffe neck length (meters); - independent sample of size \(n = 20\); - known standard deviation \(\sigma = 0.1\); - observed sample mean \(\bar x = 2.8\); - we test a null mean value \(\mu_0 = 1.8\) at \(\alpha = 0.05\).
Theory: - hypotheses: \(H_0: \mu = \mu_0\) versus \(H_1: \mu > \mu_0\); - under \(H_0\): \[ \bar X \sim \mathcal N\!\left(\mu_0, \frac{\sigma^2}{n}\right); \] - the critical cutoff is defined by \[ P_{H_0}(\bar X > K) = \alpha; \] - standardize and apply a change of variable: \[ P\!\left(\frac{\bar X-\mu_0}{\sigma/\sqrt n} > \frac{K-\mu_0}{\sigma/\sqrt n}\right)=\alpha \Rightarrow P(Z>z_\alpha)=\alpha; \] - note: since \(\alpha\) is a small probability (\(\alpha<0.5\)), this critical value is positive, so there is no sign ambiguity here; - transform back to the original scale: \[ \frac{K-\mu_0}{\sigma/\sqrt n}=z_\alpha \Rightarrow K=\mu_0+z_\alpha\frac{\sigma}{\sqrt n}; \] - decision rule: reject \(H_0\) if \(\bar x > K\).
# Right-tailed test
mu0 = 1.8
sigma = 0.1
n = 20
x_bar = 2.8
alpha = 0.05
z_alpha = norm.ppf(1 - alpha)
K = mu0 + z_alpha * sigma / np.sqrt(n)
print(f"K = {K:.3f}")
print("Decision:", "Reject H0" if x_bar > K else "Do not reject H0")
plot_xbar_test(
mu0=mu0,
sigma=sigma,
n=n,
x_bar=x_bar,
alpha=alpha,
kind='right',
title='Right-tailed test: distribution of $\\bar{X}$ under $H_0$'
)3) Two-tailed test
Problem setup (modified giraffe case): - \(X\) is giraffe neck length (meters); - independent sample of size \(n = 20\); - known standard deviation \(\sigma = 0.4\); - observed sample mean \(\bar x = 2.8\); - we test a null mean value \(\mu_0 = 3.0\) at \(\alpha = 0.05\).
Theory: - hypotheses: \(H_0: \mu = \mu_0\) versus \(H_1: \mu \neq \mu_0\); - under \(H_0\): \[ \bar X \sim \mathcal N\!\left(\mu_0, \frac{\sigma^2}{n}\right); \] - the critical bounds are defined by \[ P_{H_0}(\bar X < K_L)=\frac{\alpha}{2},\qquad P_{H_0}(\bar X > K_R)=\frac{\alpha}{2}; \] - standardize and apply a change of variable: \[ P\!\left(\frac{K_L-\mu_0}{\sigma/\sqrt n} < Z < \frac{K_R-\mu_0}{\sigma/\sqrt n}\right)=1-\alpha \Rightarrow P(-z_{\alpha/2}<Z<z_{\alpha/2})=1-\alpha; \] - transform back to the original scale: \[ K_L=\mu_0-z_{\alpha/2}\frac{\sigma}{\sqrt n},\qquad K_R=\mu_0+z_{\alpha/2}\frac{\sigma}{\sqrt n}; \] - decision rule: reject \(H_0\) if \(\bar x < K_L\) or \(\bar x > K_R\).
# Two-tailed test
mu0 = 3.0
sigma = 0.4
n = 20
x_bar = 2.8
alpha = 0.05
z_half = norm.ppf(1 - alpha / 2)
K_L = mu0 - z_half * sigma / np.sqrt(n)
K_R = mu0 + z_half * sigma / np.sqrt(n)
print(f"K_L = {K_L:.3f}, K_R = {K_R:.3f}")
print("Decision:", "Reject H0" if (x_bar < K_L or x_bar > K_R) else "Do not reject H0")
plot_xbar_test(
mu0=mu0,
sigma=sigma,
n=n,
x_bar=x_bar,
alpha=alpha,
kind='two-sided',
title='Two-tailed test: distribution of $\\bar{X}$ under $H_0$'
)