Confidence Intervals: Monte Carlo Simulation

import numpy as np
import scipy.stats as stats
import seaborn as sns
from matplotlib import pyplot as plt

Confidence Intervals

Sometimes it is not so important to discover the exact parameter value; it is enough to specify the region where it is likely to be found. Returning to our very first example with estimating daily profit of a future sales business: it is not necessary to claim with absolute precision that “average daily profit will be between \(1800\) and \(2200\)”, but this is much better than mistakenly saying “average daily profit is \(4000\)”.

The simplest and most natural point estimator for the population mean \(\mu\) is the sample mean \(\bar{X}\). However, relying on \(\bar{X}\) as an exact estimate of \(\mu\) can be risky because samples are random. If we are lucky, the sample mean is close to the population mean; if not, it may introduce noticeable error into our inference.

So if we report only a point estimate, we will probably miss the exact population parameter. If we report a range of plausible values — a confidence interval — we have a much better chance of capturing it.

For formulas and derivations, please refer to the lectures and other resources. In this notebook, we focus on the key interpretation of confidence intervals.

When we say, “The distribution parameter \(\theta\) belongs to the interval \((l, u)\) with confidence \((1-\alpha)\),” this does not mean that the probability of \(\theta\) being inside one fixed computed interval is \((1-\alpha)\).

Because \(l\) and \(u\) are computed from a sample, they are random. With each new sample, interval bounds change. A confidence level of \((1-\alpha)\) means that across all possible samples, approximately \((1-\alpha)\cdot 100\%\) of such intervals contain the unknown parameter.

1.a Confidence intervals for the population mean \(\mu\) with known variance

Let us continue the business example. Suppose we have a sample of size \(n\) of daily profits from a similar business on a nearby street. Assume daily profit is normally distributed: \(X \sim \mathcal{N}(2000, 400^2)\).

unknown_var = stats.norm(2000, 400)
mu = unknown_var.mean()
var = unknown_var.var()
std = unknown_var.std()
n_draws = 100
sample_size = 20

alpha = 0.15
z_a = stats.norm(0, 1).ppf(1 - alpha / 2)  # (1-alpha) * 100% confidence interval critical value

result = np.empty(n_draws)
for i in range(n_draws):
    sample_realization = unknown_var.rvs(size=sample_size)
    result[i] = np.mean(sample_realization)

upper_bound = result + z_a * std / np.sqrt(sample_size)
lower_bound = result - z_a * std / np.sqrt(sample_size)

First, we generate a relatively small number of samples to visually show that confidence interval bounds vary from sample to sample.

figure = plt.figure(figsize=(10, 5))
ax = figure.add_subplot()
ax.scatter(np.arange(n_draws), result, label=r'Sample mean $\bar{x}$')
ax.errorbar(np.arange(n_draws), result, yerr=upper_bound - lower_bound, ls='none')
ax.axhline(mu, color='green', label = r'True mean, $\mu$')

plt.legend(loc='best')
figure.suptitle(r'Sample size = %d, distribution parameters: $\mu = %s, \sigma = %s$' % (sample_size, mu, std))
plt.show()

Now let us collect statistics. We generate many samples and compute how often the confidence interval covers the true parameter.

n_draws = 50000
n_correct_captures = 0
error = z_a * std / np.sqrt(sample_size)
for i in range(n_draws):
    sample_realization = unknown_var.rvs(size=sample_size)
    mean_point_estimate = np.mean(sample_realization)
    if (mu < mean_point_estimate + error ) and (mu > mean_point_estimate - error):
        n_correct_captures += 1

print(n_correct_captures / n_draws)

And indeed, the empirical share of intervals that captured the parameter is close to the nominal confidence level.

1.b Confidence intervals for the true proportion \(p\)

Minimal theory (based on lecture 18):

Let \(X_i \sim \text{Bernoulli}(p)\). The point estimator of the true proportion is the sample proportion \[ \hat p = \frac{1}{n}\sum_{i=1}^n X_i. \] For sufficiently large samples (roughly, \(n\hat p \gtrsim 5\), \(n(1-\hat p) \gtrsim 5\), and \(n \geq 30\)), by CLT we use a normal approximation and obtain an approximate \((1-\alpha)\) confidence interval.

Theoretical result: \[ p \in \left(\hat p - z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}},\; \hat p + z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}\right). \]

In practice, the realisation of \((1-\alpha)\) confidence interval is given by:

\[ p \in \left(\tilde{p} - z_{\alpha/2}\sqrt{\frac{\tilde{p}(1-\tilde{p})}{n}},\; \tilde{p} + z_{\alpha/2}\sqrt{\frac{\tilde{p}(1-\tilde{p})}{n}}\right), \]

where \(\tilde{p}\) is the realisation of the sample proportion.

As in part 1.a, we check interval coverage with a Monte Carlo simulation.

# 1.b: simulation parameters
p_true = 0.39
bern = stats.bernoulli(p_true)

n_draws = 100
sample_size = 80

alpha = 0.15
z_a = stats.norm(0, 1).ppf(1 - alpha / 2)  # (1-alpha) * 100% confidence interval

tilde_p = np.empty(n_draws)
lower_bound = np.empty(n_draws)
upper_bound = np.empty(n_draws)

for i in range(n_draws):
    sample_realization = bern.rvs(size=sample_size)
    tilde_p[i] = np.mean(sample_realization)
    error = z_a * np.sqrt(tilde_p[i] * (1 - tilde_p[i]) / sample_size)
    lower_bound[i] = tilde_p[i] - error
    upper_bound[i] = tilde_p[i] + error

First, we generate a small number of samples to see that interval bounds change from sample to sample (now for proportion \(p\)).

figure = plt.figure(figsize=(10, 5))
ax = figure.add_subplot()
ax.scatter(np.arange(n_draws), tilde_p, label=r'realized sample proportion $\tilde{p}$')
ax.errorbar(np.arange(n_draws), tilde_p, yerr=upper_bound - lower_bound, ls='none')
ax.axhline(p_true, color='green', label=r'true proportion, $p$')

plt.legend(loc='best')
figure.suptitle(r'Sample size = %d, Bernoulli parameter: $p = %.2f$' % (sample_size, p_true))
plt.show()

Now let us collect statistics. We generate many samples and compute how often the confidence interval covers the true proportion \(p\).

n_draws = 50000
n_correct_captures = 0

for i in range(n_draws):
    sample_realization = bern.rvs(size=sample_size)
    tilde_p = np.mean(sample_realization)
    error = z_a * np.sqrt(tilde_p * (1 - tilde_p) / sample_size)
    if (p_true < tilde_p + error) and (p_true > tilde_p - error):
        n_correct_captures += 1

print(n_correct_captures / n_draws)

For large enough sample size, empirical coverage is close to the nominal confidence level.

1.c Confidence intervals for the expectation \(\mu \equiv E[X]\) with unknown population variance

Minimal theory (based on lecture 19):

If the population variance is unknown, for a normal sample we use the Student statistic \[ \frac{\bar X - \mu}{S/\sqrt{n}} \sim t_{n-1}, \quad S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2. \]

Then the theoretical \((1-\alpha)\) confidence interval for the mean is \[ \mu \in \left(\bar X - t_{n-1,\alpha/2}\frac{S}{\sqrt{n}},\; \bar X + t_{n-1,\alpha/2}\frac{S}{\sqrt{n}}\right). \]

In practice, the realisation of the \((1-\alpha)\) confidence interval is given by: \[ \mu \in \left(\bar x - t_{n-1,\alpha/2}\frac{s}{\sqrt{n}},\; \bar x + t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}\right), \]

where \(\bar{x}\) and \(s\) are realizations of the sample mean and sample standard deviation, respectively.

We now check interval coverage by Monte Carlo simulation, following the same logic as in part 1.a.

# 1.c: simulation parameters
unknown_var = stats.norm(2000, 500)
mu_true = unknown_var.mean()

n_draws = 100
sample_size = 10
alpha = 0.15
t_a = stats.t(df=sample_size - 1).ppf(1 - alpha / 2)

sample_means = np.empty(n_draws)
lower_bound = np.empty(n_draws)
upper_bound = np.empty(n_draws)

for i in range(n_draws):
    sample_realization = unknown_var.rvs(size=sample_size)
    mean_estimate = np.mean(sample_realization)
    s_estimate = np.std(sample_realization, ddof=1)
    error = t_a * s_estimate / np.sqrt(sample_size)

    sample_means[i] = mean_estimate
    lower_bound[i] = mean_estimate - error
    upper_bound[i] = mean_estimate + error

First, we generate a small number of samples to visualize how confidence interval bounds change from sample to sample (now with unknown variance and a \(t\) critical value).

figure = plt.figure(figsize=(10, 5))
ax = figure.add_subplot()
ax.scatter(np.arange(n_draws), sample_means, label=r'realized sample mean $\bar{x}$')
ax.errorbar(np.arange(n_draws), sample_means, yerr=upper_bound - lower_bound, ls='none')
ax.axhline(mu_true, color='green', label=r'true mean, $\mu$')

plt.legend(loc='best')
figure.suptitle(r'Sample size = %d, unknown variance case ($t$-interval)' % sample_size)
plt.show()

Now let us collect statistics. We generate many samples and compute how often the \(t\)-confidence interval covers the true mean \(\mu\).

n_draws = 50000
n_correct_captures = 0

for i in range(n_draws):
    sample_realization = unknown_var.rvs(size=sample_size)
    mean_point_estimate = np.mean(sample_realization)
    s_point_estimate = np.std(sample_realization, ddof=1)
    error = t_a * s_point_estimate / np.sqrt(sample_size)
    if (mu_true < mean_point_estimate + error) and (mu_true > mean_point_estimate - error):
        n_correct_captures += 1

print(n_correct_captures / n_draws)

Here too, empirical coverage is close to the nominal confidence level.