Frequentist probability

import math
import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
import scipy.stats as stats
from scipy.stats import randint, norm

Méthode de Monte-Carlo

For some complex scenarios it may be impossible to know distribution a priori. To get some insights we can use frequency approach: perform many experiments and define probability of event as the limit of its relative frequency to the total number of experiments.

For example let us consider frequential experiment with the 6-sided die, as if we don’t know probabilities of obtaining each side.

If we deal with continuous random variable we also can use histogram to at least approximate form of the density function.

n_draws = 36000
rv = randint(low=1,high=7)
result = np.empty(n_draws)

for i in range(n_draws):
    result[i] = rv.rvs(1)[0]

plt.figure(figsize=(10, 5))
sns.histplot(result)

In the same way we can try to analyze approximate form of the density function for continuous random variables.

Since they have infinitely many values, we divide all available space in bins, and just compute how many times our random variable of interest fell in each bin.

b = stats.norm(0, 1)

# Find out global properties of population:
mu = b.mean()
var = b.var()

# Generate Sample Means
n_draws = 100000
result = np.empty(n_draws)
n_bins = 60

for i in range(n_draws):
    result[i] = b.rvs(1)[0]   

plt.figure(figsize=(10, 5))
sns.histplot(result, bins=n_bins).grid()
counts, _, _ = plt.hist(result, bins=n_bins, alpha=0.0)  # just in order to find out the scaling coefficient for PDF
plt.title('Histogram for standard normal variable')

# scaling of normal PDF is needed, because histogram has large values on y-axis, and we need to fit them
x_space = np.linspace(-5, 5)
plt.plot(x_space, np.max(counts) * stats.norm.pdf(x_space, 0, 1) * np.sqrt(2 * np.pi), label='Normal density')
plt.legend()
plt.show()

Transition to the Central Limit Theorem: 2 dices

Let us return to the first example with 6-sided dice. But for now suppose we have 2 independent dices and we now observe two random variables at once: discrete bivariate vector \((X,Y)\).

New random variable \(T = f(X,Y) = X + Y\) is clearly a function from \(X\) and \(Y\) and also is a random variables itself. It is possible to obtain its probability mass function (PMF) theoretically, but let us apply our simulation and see how the histogram looks like.

n_draws = 36000
rv = randint(low=1,high=7)
result = np.empty(n_draws)


for i in range(n_draws):
    x = rv.rvs(1)[0]
    y = rv.rvs(1)[0]
    result[i] = x + y

plt.figure(figsize=(10, 5))
sns.histplot(result)

Transition to the Central Limit Theorem: 3 dices

In this example we go even further and here we have 3 independent dices, which means we observe three random variables at once: discrete vector \((X_1, X_2, X_3)\).

Again we are going to look at their sum - random variable \(T = X_1 + X_2 + X_3\).

n_draws = 36000
rv = randint(low=1,high=7)
result = np.empty(n_draws)


for i in range(n_draws):
    x_1 = rv.rvs(1)[0]
    x_2 = rv.rvs(1)[0]
    x_3 = rv.rvs(1)[0]
    result[i] = x_1 + x_2 + x_3

plt.figure(figsize=(10, 5))
sns.histplot(result)

Transition to the Central Limit Theorem: 5 dices

In this example we go even further and here we have 5 independent dices, which means we observe five random variables at once: discrete vector \((X_1, X_2, X_3, X_4, X_5)\).

Again we are going to look at their sum - random variable \(T = X_1 + X_2 + X_3 + X_4 + X_5\).

n_draws = 100000
rv = randint(low=1,high=7)
result = np.empty(n_draws)


for i in range(n_draws):
    x_1 = rv.rvs(1)[0]
    x_2 = rv.rvs(1)[0]
    x_3 = rv.rvs(1)[0]
    x_4 = rv.rvs(1)[0]
    x_5 = rv.rvs(1)[0]
    result[i] = x_1 + x_2 + x_3 + x_4 + x_5

plt.figure(figsize=(10, 5))
sns.histplot(result)

Central Limit Theorem

‘Philosophical’ form

Let \(X_1, \ldots, X_n\) be a sequence of independent random variables taken from the same distribution, i.e. all \(X_i\)’s have the same expectation \(\mu\) and finite variance \(\sigma^2\). Let us construct new random variable:

\[ S_n = \sum \limits_{i=1}^{n} X_i. \]

According to the properties of expectation and variance, properties of the new random variable are:

\[\begin{gather} E[S_n] = E \Bigl[ \sum \limits_{i=1}^{n} X_i \Bigr] = \sum \limits_{i=1}^{n} E[X_i] = n \mu \\ Var[S_n] = Var \Bigl[ \sum \limits_{i=1}^{n} X_i \Bigr] = \sum \limits_{i=1}^{n} Var[X_i] = n \sigma^2 \end{gather}\]

Central Limit Theorem: distribution of such random variable \(S_n\) tends to normal as \(n \rightarrow \infty\): \(S_n \rightarrow Y \sim \mathcal{N}(n \mu, n \sigma^2)\). More specifically formulated:

\[ \forall x \in \mathbb{R} \quad P(S_n < x) \underset{n \rightarrow \infty}{\longrightarrow} P(Y < x), \text{ where } Y \sim \mathcal{N}(n \mu, n \sigma^2). \]

In simple words, we can just say that \(S_n \sim \mathcal{N}(n \mu, n \sigma^2)\), when \(n\) is sufficiently large.

Once again, sum of many ANY random variables gives us.. normal?

Looks like cheap scam!

To prove that let’s take random variable that is faaar from being normal.

How on Earth the density function of sum of them can at least remotely resemble normal distribution?

b = stats.beta(0.05, 0.09)

# Find out global properties of population:
mu = b.mean()
var = b.var()

# Generate Sample Means
n_draws = 10000
random_var = np.empty(n_draws)
n_bins = 20

for i in range(n_draws):
    random_var[i] = np.random.beta(0.05, 0.05)   

plt.figure(figsize=(10, 5))
sns.histplot(random_var, bins=n_bins).grid()
counts, _, _ = plt.hist(random_var, bins=n_bins, alpha=0.0)  # just in order to find out the scaling coefficient for PDF
plt.title('Histogram for beta random variable')

# scaling of normal PDF is needed, because histogram has large values on y-axis, and we need to fit them
x_space = np.linspace(-3, 3)
plt.plot(x_space, np.max(counts) * stats.norm.pdf(x_space, 0, 1) * np.sqrt(2 * np.pi), label='Normal density')
plt.legend()
plt.show()

plt.figure(figsize=(10, 5))

plt.title('True density of beta random variable')

# scaling of normal PDF is needed, because histogram has large values on y-axis, and we need to fit them
x_space = np.linspace(-1, 3, 10001)
plt.plot(x_space, b.pdf(x_space), label='beta density')
plt.grid()
plt.legend()
plt.show()

b = stats.beta(0.05, 0.09)

# Find out global properties of population:
mu = b.mean()
sigma = np.sqrt(b.var())
var = b.var()

print("Parameters of a single random variable: Mean = {}, Var = {}".format(mu, var))

# Generate Sample Means
n_draws = 10000
x_totals = np.empty(n_draws)
n_bins = 50
for sample_size in range(1, 50):
    for i in range(n_draws):
        sample = b.rvs(size=sample_size)
        x_totals[i] = np.sum(sample)    
    
    plt.figure(figsize=(12, 3))
    sns.histplot(x_totals, bins=n_bins).grid()
    counts, _, _ = plt.hist(x_totals, bins=n_bins, alpha=0.0)  # just in order to find out the scaling coefficient for PDF
    if sample_size == 1:
        plt.title(r'Approximate density shape of variable $(X_1)$')
    else:
        plt.title(r'Approximate density shape of variable $(X_1 + \ldots + X_{%d})$' % (sample_size))
    
    # scaling of normal PDF is needed, because histogram has large values on y-axis, and we need to fit them
    x_space = np.linspace(sample_size * mu - 3 * var * sample_size, sample_size * mu + 3 * var * sample_size, 1000)
    current_std = np.sqrt(var * sample_size)
    plt.plot(x_space, np.max(counts) * stats.norm.pdf(x_space, sample_size * mu, current_std) * np.sqrt(2 * np.pi) * current_std, label=r'$\mathcal{N} \; \left(%s \cdot %s, \; %s \sigma^2\right)$' % (sample_size, mu, sample_size))
    plt.legend()
    plt.show()
    # plt.savefig('plot_{}.png'.format(sample_size))
    plt.close()

Central Limit Theorem

More widespread formulation

More often in textbooks we can meet another formulation of CLT.

Firstly, recall that if \(X\) is any normal random variable, then the following function makes it standard normal:

\[ Z = \frac{X - E[X]}{\sqrt{Var[X]}}, \quad Z \sim \mathcal{N}(0,1) \]

Let \(X_1, \ldots, X_n\) be a sequence of independent random variables taken from the same distribution, i.e. all \(X_i\)’s have the same expectation \(\mu\) and finite variance \(\sigma^2\).
As before we construct new random variable: \[ S_n = \sum \limits_{i=1}^{n} X_i, \quad E[S_n] = n \mu, \quad Var[S_n] = n \sigma^2. \]
Let us apply tranformation to \(S_n\): \[ Z_n = \frac{S_n - E[S_n]}{\sqrt{Var[S_n]}} = \frac{S_n - n \mu}{\sigma \sqrt{n}} \]
Properties of this new random variable: \[\begin{gather} E[Z_n] = E \Bigl[\frac{S_n}{\sigma \sqrt{n}} - \frac{n \mu}{\sigma \sqrt{n}} \Bigr] = E \Bigl[\frac{S_n}{\sigma \sqrt{n}} \Bigl] - E \Bigl[\frac{n \mu}{\sigma \sqrt{n}} \Bigr] = \frac{n \mu}{\sigma \sqrt{n}} - \frac{n \mu}{\sigma \sqrt{n}} = 0 \\ Var[Z_n] = Var \Bigl[\frac{S_n}{\sigma \sqrt{n}} - \frac{n \mu}{\sigma \sqrt{n}} \Bigr] = Var \Bigl[\frac{S_n}{\sigma \sqrt{n}} \Bigl] + Var \Bigl[\frac{n \mu}{\sigma \sqrt{n}} \Bigr] = \frac{Var[S_n]}{n \sigma^2} + 0 = 1. \end{gather}\]
Central Limit Theorem: distribution of such random variable \(Z_n\) tends to standard normal as \(n \rightarrow \infty\): \(Z_n \rightarrow Z \sim \mathcal{N}(0, 1)\).
In simple words, if \(n\) is large enough, then we can say that \(S_n \sim \mathcal{N}(n \mu, n \sigma^2)\), and simultaneously random variable \(Z_n = \frac{S_n - n\mu}{\sigma \sqrt{n}}\) has standard normal distribution i.e. \(Z_n \sim \mathcal{N}(0, 1)\).

b = stats.beta(0.05, 0.05)

# Find out global properties of population:
mu = b.mean()
sigma = np.sqrt(b.var())
var = b.var()

# Generate Sample Means
n_draws = 10000
x_totals = np.empty(n_draws)
n_bins = 50

for sample_size in range(1, 35):
    for i in range(n_draws):
        sample = b.rvs(size=sample_size)
        x_totals[i] = np.sum(sample)    

    # transformation of all realizations to standard normal
    z = (x_totals - sample_size * mu) / np.sqrt(var * sample_size)
    plt.figure(figsize=(10, 5))
    sns.histplot(z, bins=n_bins).grid()
    counts, _, _ = plt.hist(z, bins=n_bins, alpha=0.0)  # just in order to find out the scaling coefficient for PDF
    plt.title(r'Approximate density shape of variable $Z_n$ (sample size = {})'.format(sample_size))

    # scaling of standard normal PDF, because histogram has large values on y-axis, and we need to fit them
    x_space = np.linspace(-5, 5)
    plt.plot(x_space, np.max(counts) * stats.norm.pdf(x_space, 0, 1) * np.sqrt(2 * np.pi), label='Standard Normal')
    plt.legend()
    plt.show()