A pi-pie named Monte Carlo
A lively community of people including students, researchers and tinkerers from Marseille (France) celebrate the so-called "π-day" on the 3rd month, 14th day of each year. A nice occasion for general talks on mathematics and society in a lively athmosphere and of course to ... a pie contest!
I participated last year (in 2016) with a pie called "Monte Carlo". Herein, I give the recipe by giving some clues on its design... This page is a notebook - meaning that you can download it and re-run the analysis I do here at home (and most importantly comment or modify it and correct potential bugs...).
Let's first initialize the notebook:
from __future__ import division, print_function
import numpy as np
np.set_printoptions(precision=6, suppress=True)
import os
%matplotlib inline
#%config InlineBackend.figure_format='retina'
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
fig_width = 10
figsize = (fig_width, fig_width)
%load_ext autoreload
%autoreload 2
The pie looked like this:
from IPython.display import Image
Image('http://www.piday.fr/images/tartes-2016/24.jpg')
the recipe¶
The global design of the pie was to start from its round shape ...
fig, ax = plt.subplots(figsize=figsize)
N_dots = 256
angles = np.linspace(0, 2*np.pi, N_dots, endpoint=True)
ax.plot(np.cos(angles), np.sin(angles), lw=4);
and to trace the biggest square inside it. To do that, you do not need anything special, but to draw the diagonals on the cardinals:
fig, ax = plt.subplots(figsize=figsize)
ax.plot(np.cos(angles), np.sin(angles), lw=4);
ax.plot([0, 0], [-1, 1], 'r', lw=2);
ax.plot([-1, 1], [0, 0], 'r', lw=2);
and finally to join their extremities to get this square:
fig, ax = plt.subplots(figsize=figsize)
ax.plot(np.cos(angles), np.sin(angles), lw=4);
ax.plot([0, 0], [-1, 1], 'r', lw=2);
ax.plot([-1, 1], [0, 0], 'r', lw=2);
ax.plot([0, 1], [1, 0], 'g', lw=2);
ax.plot([0, 1], [-1, 0], 'g', lw=2);
ax.plot([0, -1], [1, 0], 'g', lw=2);
ax.plot([0, -1], [-1, 0], 'g', lw=2);
Note something : the disk within the blue circle has a diameter of $2$, thus its surface is equal to $\pi$. The green square is composed of $4$ isocele rectangular triangles which can be shuffled to be reorganized in two unit squares: its surface is thus equal to $2$. Equivalently, the edge length is $\sqrt 2 $ (because it is the hypothenuse of the unit square and by Pythagoras' theorem its length is $\sqrt{1^2 + 1^2}$) such that we verify that the surface is $2$.
Let's now take a bunch (let's say $22$) of smarties (©) and let's throw them all at random on the pie. (Say that if one falls outside , you do not eat it but throw it again until it lands on the pie.)
N_smarties = 22
np.random.seed(42+1)
angle_smarties = 2*np.pi*np.random.rand(N_smarties)
radius_smarties = np.sqrt(np.random.rand(N_smarties))
x_smarties = radius_smarties * np.cos(angle_smarties)
y_smarties = radius_smarties * np.sin(angle_smarties)
fig, ax = plt.subplots(figsize=figsize)
ax.plot(np.cos(angles), np.sin(angles), lw=4);
ax.plot([0, 0], [-1, 1], 'r', lw=2);
ax.plot([-1, 1], [0, 0], 'r', lw=2);
ax.plot([0, 1], [1, 0], 'g', lw=2);
ax.plot([0, 1], [-1, 0], 'g', lw=2);
ax.plot([0, -1], [1, 0], 'g', lw=2);
ax.plot([0, -1], [-1, 0], 'g', lw=2);
ax.scatter(x_smarties, y_smarties, s=100, c='y', lw=1);
ax.axis('tight');
It's ready! We have our finished pie with the square and the smarties.
Monte Carlo?¶
Now, why "Monte Carlo"? Because it is a city in southern France which is well known for casinos, where randomness rules the game (most often is favor of the owners of the casino, but this is another story). As a consequence, mathematicians used this term (jokingly) to designate a method that will use randomness to compute results. This is in particular much used to compute surfaces, which bear the name of "integrals" (their symbol being $\int$) in mathematics. They have many practical uses in mathematics and physics and the variables are often in very high dimensional spaces, but for now we will land back to the 2D space of our pie...
Indeed, what is the relationship with our pie?
First, let's note that it's easy to count the number of smarties that are in the square:
in_square = ((x_smarties+y_smarties)**2 < 1 ) * ((x_smarties-y_smarties)**2 < 1 )
fig, ax = plt.subplots(figsize=figsize)
ax.plot(np.cos(angles), np.sin(angles), lw=4);
ax.plot([0, 0], [-1, 1], 'r', lw=2);
ax.plot([-1, 1], [0, 0], 'r', lw=2);
ax.plot([0, 1], [1, 0], 'g', lw=2);
ax.plot([0, 1], [-1, 0], 'g', lw=2);
ax.plot([0, -1], [1, 0], 'g', lw=2);
ax.plot([0, -1], [-1, 0], 'g', lw=2);
ax.scatter(x_smarties, y_smarties, s=100, c='y', lw=1);
ax.scatter(x_smarties[in_square], y_smarties[in_square], s=100, c='orange', lw=1);
ax.axis('tight');
We have thrown the smarties at random: their probability to land anywhere on the pie is uniform. As a consequence, their probability to land inside the square is the ratio between the respective surfaces of the square and the circle, that is, $\frac{2}{\pi}\approx 63\%$. Let's see what this gives:
print(u' >> Probability of landing on the square ', 2 / np.pi)
print(u' >> our observation ', np.sum(in_square)/N_smarties)
pretty close! But we also get something fun: the relative number of smarties (hence the name "smarts" [ref. needed]) may give an approximation of $\pi$ as
$$ \pi \approx 2 * \frac {N}{n} $$where $N$ is the total number of smarties and $n$ the number of smarties in the square. I found that the possibility of approximating such a highly respected number as π simply by using randomness and a pie a good enough reason to submit my recipe to the contest...
Let's now see what our particular pie (which can be considered as one instance of a Monte Carlo experiment):
print(u' >> Probability of landing on the square ', 2 / np.pi)
print(u' >> our estimate of π = ', 2.*N_smarties/np.sum(in_square), ' ; True value is π = ', np.pi)
print(u' >> error = ', (2.*N_smarties/np.sum(in_square)-np.pi)/np.pi * 100, ' % ')
GREAT! We get an estimate of the order of $0.04 \%$!
a quantitative evaluation¶
We may summarize our method and evaluate it for arbitrary numbers of smarties:
def monte_carlo(N_smarties=22, seed=2015):
np.random.seed(seed)
angle_smarties = 2*np.pi*np.random.rand(N_smarties)
radius_smarties = np.sqrt(np.random.rand(N_smarties))
x_smarties = radius_smarties * np.cos(angle_smarties)
y_smarties = radius_smarties * np.sin(angle_smarties)
in_square = ((x_smarties+y_smarties)**2 < 1 ) * ((x_smarties-y_smarties)**2 < 1 )
return 2.*N_smarties/np.sum(in_square)
Let's now see what this method gives with 10, 100, ..., one million smarties (what a pie!):
for N in 10**np.arange(7):
print(u' >> with ', N, ' smarties, our estimate of π = ', monte_carlo(N_smarties=N))
We see that our estimate gets better, the bias gets lower. How variable is the algorithm?
When we do not have as many smarties, one can repeat the experiment many times. To do this we use an important aspect of random number generators: they are pseudo-random. Indeed, such functions most often give by default completely different sequences of numbers, but one has the possibility of generating the same sequence (of random numbers) if one specifies the "seed", that is a number which intializes the number genrator. As such one says that the generated sequence is pseudo-random (because it can be guessed if and only if one knows the seed), or also that it is frozen. Here, we simply use the seed to generate different sequences in a reproducible way:
for seed in range(42, 55):
print(u' >> with seed = ', seed, ' , our estimate of π = ', monte_carlo(seed=seed))
But what? The method is very fine for some seeds, but not for other... How come and what do we learn from that observation?
Let's imagine $N=1000$ and let's do many ($10000$) repetitions of our estimate:
N_trials = 10000
pi_estimate = np.empty(N_trials)
for seed in range(N_trials):
pi_estimate[seed] = monte_carlo(N_smarties=1000, seed=42 + seed)
N_bins = 32
pi_hist, bins = np.histogram(pi_estimate, bins=np.linspace(2.8, 3.5, N_bins))
We can thus draw an histogram, that is, a diagram showing the number of values obtained by independent repetitions of our method:
fig, ax = plt.subplots(figsize=figsize)
ax.bar(bins[:-1], pi_hist, width=(bins.max()-bins.min())/(len(bins)));
ax.set_xlabel('value')
ax.set_ylabel('smarts')
ax.axis('tight');
This is often occuring in science: Imagine you try to find an unknown value (in machine learning terminology, this is a hidden or latent variable), and that is here $\pi$, but you have an approximate measurement (our Monte Carlo algorithm, a device that sometimes fails, a telescope which looks at a faint, very far away light source, a neuron that fires in an unpredictive way, ...). Evaluating this variability is therefore essential in the scientific process and (instead of being thrown under the carpet) should be assessed explictly.
Luckily, mathematics provides some useful tools and we will just explore some examples of them. First, you may have recognized the shape of the histogram, it is a Gaussian (or normal, or bell-shaped) distribution characterized by only two parameters (what statisticians call the 2 first moments):
pi_mean = pi_estimate.mean()
pi_std = pi_estimate.std()
print(u' >> bell-shaped distribution with mean = ', pi_mean, ', and standard deviation = ', pi_std)
fig, ax = plt.subplots(figsize=figsize)
ax.bar(bins[:-1], pi_hist, width=(bins.max()-bins.min())/(len(bins)));
ax.plot(bins[:-1], N_trials/N_bins/np.sqrt(4*np.pi)/pi_std * np.exp(- .5 * (bins[:-1] -pi_mean)**2 / pi_std**2), 'r--', lw=2);
ax.set_xlabel('value')
ax.set_ylabel('density')
ax.axis('tight');
This result is the consequence of an ubiquitous theorem : the central limit theorem. The latter states that whenever you collect values which are noisy around a mean value and that this level of variability (given by the standard deviation) is constant, then the distribution is Gaussian.
It also states an important result: whenever you increase the number of samples you take (that is, in our case the number of smarties), the mean and standard deviation are always similar but their precision get better with respect to the true value:
N_trials = 10000
N_levels = 4
fig, axs = plt.subplots(N_levels, 1, figsize=figsize)
for i_level in range(N_levels):
N_smarties = 10**(2 + i_level)
pi_estimate = np.empty(N_trials)
for seed in range(N_trials):
pi_estimate[seed] = monte_carlo(N_smarties=N_smarties, seed=42 + seed)
pi_hist, bins = np.histogram(pi_estimate, bins=np.linspace(2.5, 4., N_bins))
pi_mean = pi_estimate.mean()
pi_std = pi_estimate.std()
print(u'Using ', N_smarties, ' smarties, the bell-shaped distribution has mean = ', pi_mean, ', and standard deviation = ', pi_std)
print(u' >> error = ', np.abs(pi_mean-np.pi)/np.pi * 100, ' % ')
print(u' >> scaled squared error = ', ((pi_mean-np.pi)/np.pi)**2 * (N_smarties))
axs[i_level].bar(bins[:-1], pi_hist, width=(bins.max()-bins.min())/(len(bins)));
axs[i_level].set_ylabel('density')
axs[i_level].axis('tight');
axs[i_level].set_xlabel('value');
In particular, this shows that the squared root of the error scales with the square root of the number of smarties. As such, one important value is to consider the precision as the inverse of the squared standard deviation (that is of the variance) as it scales linearly with the number of trials.
Just remember the hierarchy of our procedure:
- we test our distributions at different levels
- for each level, we make different trials,
- for each trial, we make a pie with many smarties and count the ones in the squares.
Yet, it only takes some seconds on the computer - hence the popularity of the method nowadays.
As a consequence, scientists use some caution whenever they get a result. Any inferred value is most often given within a confidence interval, for instance by giving:
- the median value of the different repetitions (that is the value that corresponds to that where half of the other values are below and the other half above),
- the confidence interval as the values around the median correponding to 95% of the cases
N_trials = 10000
pi_estimate = np.empty(N_trials)
for seed in range(N_trials):
pi_estimate[seed] = monte_carlo(N_smarties=1000, seed=42 + seed)
pi_hist, bins = np.histogram(pi_estimate, bins=np.linspace(2.8, 3.5, 32))
pi_median = np.median(pi_estimate)
pi_low = np.percentile(pi_estimate, 5)
pi_high = np.percentile(pi_estimate, 95)
print(u' >> distribution with median = ', pi_median, ', in interval [ ', pi_low, ',', pi_high, ']')
fig, ax = plt.subplots(figsize=figsize)
pi_hist = pi_hist * 1.
pi_hist /= pi_hist.sum()
ax.bar(bins[:-1], pi_hist, width=(bins.max()-bins.min())/(len(bins)));
ax.plot([pi_median, pi_median], [0, pi_hist.max()], 'r', label='median');
ax.plot([pi_low, pi_low], [0, pi_hist.max()], 'g--', label='low');
ax.plot([pi_high, pi_high], [0, pi_hist.max()], 'c--', label='high');
ax.set_xlabel('value')
ax.set_ylabel('density')
ax.axis('tight');
ax.legend(loc='best');
The journey is almost over: from a pie, we devised an algorithm in the form of a recipe to compute the value of π. Then we observed that the result of our recipe varied, but that we could use some mathematical tools to quantify this variablity. We now know that π is around $3.15$ and almost surely between $3.05$ and $3.25$. Nice.
the elephant in the room¶
But wait... This result is at stake with the result we got in our $22$-smarties pie for which we triumphed with an estimate of the order of $0.04 \%$! This is far, far better than the result we got with $10000$ of $1000$-smarties-pies... What happened?
Above we saw that the results depended on the instance of how we chose to seed the random number generator:
for seed in range(42, 55):
print(u' >> with seed = ', seed, ' , our estimate of π = ', monte_carlo(seed=seed))
and if we make many of these experiments, this is what we get:
N_trials = 10000
pi_estimate = np.empty(N_trials)
for seed in range(N_trials):
pi_estimate[seed] = monte_carlo(N_smarties=22, seed=42 + seed)
pi_hist, bins = np.histogram(pi_estimate, bins=np.linspace(2., 5., 128))
We can thus draw an histogram, that is, a diagram showing the number of values obtained by independent repetitions of our method:
fig, ax = plt.subplots(figsize=figsize)
ax.bar(bins[:-1], pi_hist, width=(bins.max()-bins.min())/(len(bins)));
ax.set_xlabel('value')
ax.set_ylabel('smarts')
ax.axis('tight');
Since, we use a small number of smarties, the estimate can take only a small number of values : the distribution still has a gaussian envelope, but is non-null only for certain values. Among these values, the one which is the most probable is the value which gives for an approximation of π:
print(u' >> our estimate of π = ', 22/7, ' ; True value is π = ', np.pi)
This approximation of π is known from a long time [ref needed], and two questions arise:
- did the cook lie?
- why is that approximation so good?
For the first question, you can answer quite easily. Either the cook cheated and placed 7 smarties outside the square on purpose, or he did not cheat
print(u' >> estimate that the cook cheated = ', (1 - pi_hist.max() / pi_hist.sum()) * 100, ' % ')
Pretty high, and I must admit this was the case... It teaches us also to be cautious whenever we hear about some (too) extraordianry scientific results - especially if it has not been carefully designed to avoid such biases. Our bias here is that the cook new that $22$ may give a good estimate.
This leads to a second question, why 22 and not 42? A priori, there is no reason that one approximation of π should be better than any other. Let's see the error when we change the denominator:
$$ \pi \approx \frac n d $$such that $$ n = I( \pi \cdot d) $$ where $I(x)$ is the integer value of $x$.
for denominator in np.arange(1, 20):
numerator = np.ceil(np.pi*denominator)
print(u' >> our estimate of π = ', int(numerator), '/', denominator, '= ', numerator/denominator , ' ; Error around π = ', np.absolute(1.*numerator/denominator - np.pi))
Let's plot the error as a function of the denominator
def int_numerator(denominator, number=np.pi):
return np.ceil(number*denominator)
N_test = 10000
pi_error = np.empty(N_test)
for i_test in range(N_test):
denominator = 1. + i_test
numerator = int_numerator(denominator)
pi_error[i_test] = np.absolute(numerator/denominator - np.pi)
fig, ax = plt.subplots(figsize=figsize)
ax.loglog(np.arange(1, 1 + N_test), pi_error);
ax.loglog(np.arange(1, 1 + N_test), np.arange(1, 1 + N_test)**-1., 'r--');
ax.set_xlabel('denominator')
ax.set_ylabel('error')
ax.axis('tight');
Interestingly, the envelope of this curve (the error with respect to the denominator) follows a decreasing trend: this was to be expected when we think again at the central limit theorem: as the denominator increases, the precision increases. We have shown this in the plot above by showing the $1/d$ line, which is a straight line in this plot, because we used log scales on both axis.
Compared to this baseline, what is more surprising is that the error decreases, then makes a "spike" then goes back to baseline. The first spike lies at 7 and for the 16 next repetitions:
fig, ax = plt.subplots(figsize=figsize)
N_step = 140
ax.semilogy(np.arange(1, 1 + N_step), pi_error[:N_step]);
ax.semilogy(np.arange(1, 1 + N_step), np.arange(1, 1 + N_step)**-1., 'r--');
ax.set_xlabel('denominator')
ax.set_xticks(7*np.arange(17))
ax.set_xticklabels(7*np.arange(17))
ax.set_ylabel('error')
ax.axis('tight');
The fact that we see a repetition of same pattern is to be expected as it is the same approximation scaled by an integer:
$$ \frac {22} {7} = \frac {22*2} {7*2} =\frac {44} {14} = \frac {22*3} {7*3} = \frac {66} {21} = \ldots $$In retrospect, the shape before the spike can be understood as the error quite smoothly approaches the 22 / 7 approximation (green line):
fig, ax = plt.subplots(figsize=figsize)
N_step = 8
ax.semilogy(np.arange(1, 1 + N_step), pi_error[:N_step]);
ax.semilogy(np.arange(1, 1 + N_step), np.arange(1, 1 + N_step)**-1., 'r--');
ax.semilogy(np.arange(1, 1 + N_step), np.absolute(22/np.arange(1, 1 + N_step) - np.pi), 'g--');
ax.set_xlabel('denominator')
ax.set_xticks(7*np.arange(17))
ax.set_xticklabels(7*np.arange(17))
ax.set_ylabel('error')
ax.axis('tight');
But suddenly, we have a new approximation of an even higher order:
for denominator in range(110, 116):
print(u' >> our estimate of π = ', int(int_numerator(denominator)), '/', denominator, ' = ', int_numerator(denominator)/denominator, ' ; True value is π = ', np.pi)
print(u' >> error = ', (int_numerator(denominator)/denominator-np.pi)/np.pi * 100, ' % ')
That other approximation of π as 355 / 113 is also well known and is as useful as a 6 digits representation (and quite easy to remember from its symmetric decimal representation). Going further is quite tedious as we approach the precision of the numbers used on this computer...
At this point, we know the cook cheated and that there is something special in the appearance of approximations to π. Is there something special with π? Is it the solution to all philosophical answers (or to questions I do not remember)? So let's try with the other great number: $e$
N_test = 10000
e_error = np.empty(N_test)
for i_test in range(N_test):
denominator = 1. + i_test
numerator = int_numerator(denominator, number=np.exp(1))
e_error[i_test] = np.absolute(numerator/denominator - np.exp(1))
fig, ax = plt.subplots(figsize=figsize)
ax.loglog(np.arange(1, 1 + N_test), e_error);
ax.loglog(np.arange(1, 1 + N_test), np.arange(1, 1 + N_test)**-1., 'r--');
ax.set_xlabel('denominator')
ax.set_ylabel('error')
ax.axis('tight');
Or the golden number $\phi = .5 (1+ \sqrt 5)$ for which we expect to see ratios of consecutive Fibonacci numbers as good approximations:
phi = .5* (1 + np.sqrt(5))
N_test = 10000
e_error = np.empty(N_test)
for i_test in range(N_test):
denominator = 1. + i_test
numerator = int_numerator(denominator, number=phi)
e_error[i_test] = np.absolute(numerator/denominator - phi)
fig, ax = plt.subplots(figsize=figsize)
ax.loglog(np.arange(1, 1 + N_test), e_error);
ax.loglog(np.arange(1, 1 + N_test), np.arange(1, 1 + N_test)**-1., 'r--');
ax.set_xlabel('denominator')
ax.set_ylabel('error')
ax.axis('tight');
Looks similar! We were not fooled this time. But more work needs to be done to understand