Deriving the normal distribution

When I was in high school, I asked several a person, “Why the normal distribution?”. After all, the function {e^{-x^2/2}/\sqrt{2\pi}} looks like a pretty bizarre function to guess, when there are other functions like {1/(1+x^2)} which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of {\sqrt{n}} rather than {n}). On the way, I will also check both laws of large numbers in this simple case.

Consider a “random walk” on {\bf Z}. Specifically, let {X} be a random variable taking the values {1} and {-1} each with probability {1/2}, and let

\displaystyle  S_n = X_1+X_2+\ldots+X_n,

where each {X_i} is an independent copy of {X}. Note that {(X+1)/2} is Bernoulli, so {S_n/2 + n/2} is binomial, so

\displaystyle  {\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},

where we make the convention that the binomial coefficient is zero when it doesn’t make sense. Hence, if {x>0},

\displaystyle  {\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.

By Stirling’s formula, {m! \sim \sqrt{2\pi m} (m/e)^m}, so

\displaystyle  \left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}

as {m,k\rightarrow\infty}. Hence, for constant {x>0},

\displaystyle  \mathbf{P}(S_n/n \geq x) \leq \frac{n}{2} \binom{n}{\lceil (1+x)n/2\rceil} 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

Let {f(x) = (1+x)^{1+x} (1-x)^{1-x}}. Note that {f(x)>0} for {0<x<1}, that {f(0)=1}, and that

\displaystyle  \frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,

so {f(x)>1} for all {0<x<1}. Hence

\displaystyle  {\bf P}(S_n/n\geq x) \rightarrow 0,

and by symmetry,

\displaystyle  {\bf P}(S_n/n\leq -x) \rightarrow 0,

as {n\rightarrow\infty}. Thus we have verified the weak law.

In fact, because the convergence above is geometric,

\displaystyle  \sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,

whence

\displaystyle  {\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m) \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.

(This is the Borel–Cantelli lemma.) Thus {S_n/n\rightarrow 0} almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of {S_n/\sqrt{n}}. Now, for constant {x<y},

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),

where {\mu_n} is the atomic measure assigning a mass {2/\sqrt{n}} to each point of {(n+2{\bf Z})/\sqrt{n}}. Now Stirling’s formula strikes again, giving

\displaystyle  \frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},

as

\displaystyle  \lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),

where {R_n(t)\rightarrow 0} as {n\rightarrow\infty}. Moreover this convergence is uniform in {t\in[x,y]}, so

\displaystyle  \left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,

where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.

A simpler example of a nonmeasurable set

I don’t claim that I’m the first person to have come up with this example; I only claim that yesterday is the first time I thought of it. This is a rather sorry state of affairs, because it’s an obvious and nice example.

First, some background, the example of a nonmeasurable set that I was given when I first learnt measure theory went basically as follows. Start with {[0,1]}, and call {x,y\in[0,1]} equivalent if {x-y\in{\bf Q}}. This defines an equivalence relation, and thus we can define a set {Z} to consist of exactly one representative from each equivalence class. Then the interval {[0,1]} is just the disjoint union of translates mod {1} of {Z} by rational numbers mod {1} (or something). Because the rationals in {[0,1]} are countably infinite this is incompatible with {Z} being measurable: if {\mu(Z)>0} then then {\mu([0,1])=\infty}, and if {\mu(Z)=0} then {\mu([0,1])=0}.

This example is fine and beautiful and all that, but I find it rather hard to remember and especially difficult to visualize. Thus, yesterday, when I was explaining to my girlfriend in an intuitive way why not all sets are measurable, the example I produced was, by accident, not the above example.

From an intuitive point of view, if we’re talking about {[0,1]} and we’re generally thinking about mod {1} then we’re probably better off just talking about the circle {S^1} and not confusing the issue. Moreover, now translation mod {1} is just rotation, which is much easier to visualize and, importantly, much easier to believe is measure-preserving. Now we just need to decompose {S^1} into a countably infinite collection of copies of some set {Z}. Towards this end, let {T} be any irrational rotation of {S^1}. (When I was talking yesterday, I started with the concrete example of rotation by {1} degree, and then hastily changed this to rotation by {1} radian.) Then for any point {x} we can consider the orbit {\{T^n(x)\}_{n\in\mathbb{Z}}}, and we thus decompose {S^1} into a union of orbits under the action of {T}. Let {Z} be a set consisting of exactly one representative of each of these orbits. Since every orbit is infinite, {S^1} is the disjoint union of

\displaystyle \ldots,T^{-2}(Z), T^{-1}(Z), Z, T(Z), T^2(Z),\ldots.

Since {T} is measure-preserving, we have a contradiction as before.