The Poisson distribution as an approximation of the
When I was reading up on radioactive decay,
I found out that the probability of decay could be expressed in terms of a
discrete probability distribution. I had to look it up further because I had
just recently learned about probability distributions in math. What I found out
was intriguing; the number of atoms decaying in a specific interval of time is
actually a discrete variable. It also adheres to Poisson’s postulates, talked
about later, so it can be expressed in terms of Poisson distribution.
However, when what I noticed was that that
some research papers and online resources would use Poisson distribution, and
some would use the binomial distribution. This intrigued me because if one
could be used over another, then that means the two should be synonymous in
nature. This goes against my previous knowledge on the two because I thought that
since they both have different formulas, they should be two very different
After consultation with my physics teacher,
I was told that probability distributions would come up in higher education
engineering courses. Being an aspiring engineer, I had to explore the differences
between binomial and Poisson distribution and understand why they can be used
side by side.
The Poisson distribution was first
introduced in the 18th century by mathematician Siméon Denis Poisson
as a way to research on the number of wrong court decisions over a period of
time. In a more general sense, this distribution tells us the probability of the sum of successful independent Bernoulli trials
given a fixed interval of time. A Bernoulli trial is a statistical
experiment where there are only two possible outcomes, either success or
failure. Graphically, it provides a discrete probability distribution function
where each point in the y-axis gives the numerical probability of a discrete random
variable X. It is typically denoted
The Poisson distribution works with
discrete variables, so it is worth discussing what it is first. Discrete variables:
Have a finite set of data
Are obtained by counting (and
Are non-mutually exclusive
Have a complete range of
Are represented by distinct,
isolated points in a graph
One of the simplest statistical examples of
the discrete variable is the probability of getting number of heads when a coin
is tossed times. Suppose we have a fair
coin, the probability of getting number of heads when the coin
is tossed 2 times is:
Because of the nature of the discrete
variable, its probability distribution can often just be expressed in a tabular
There are several assumptions to be made
when using the Poisson distribution. When these assumptions are met, then the
Poisson discrete probability distribution can be used.
The probability of success is
the same throughout the whole experiment
The probabilities are
independent of one another
The probability of a success
happening over a small time period is essentially the value of that time period
The probability of more than
one success in a small time period is essentially 0
The rate of success is only
dependent on the length of the interval of time
The experiment at hand is a
part of a Bernoulli trial
One thing worth adding when comparing the
probabilities of a random variable in real life and in calculations is that if
there is a large discrepancy between the two sets of data, there must be an external
factor coming into play when the statistical experiment was done.
distribution as a derivation from binomial distribution
distribution has a more specific definition, which is the probability of having successful outcomes out of Bernoulli trials. For the probability that
a discrete random variable happens times, it is denoted as:
What I found out after my exploration
amazed me; the Poisson distribution is actually just a derivation from the
binomial theorem for when and . It is a mathematical limit of the binomial distribution. The
Poisson distribution thus has to be derived from:
To come up with a correct derivation, the
calculations must adhere to Poisson’s postulates. Because the probability of
success is identical throughout the whole experiment, this implies that in number of trials, the
expected value is , which is also the definition of mean ().
Substituting into the equation:
Now the first two terms can be taken out
and manipulated. It can be rewritten as:
Because both the numerator and denominator
now have they can be cancelled out,
leaving the following:
From here, it can be noted that both the
numerator and denominator have number of terms. However, as reaches infinity, the value
of this whole fraction approaches 1. Therefore, it can be said that the value
of the first two terms is 1.
The last term can be divided into two
For the first part, an expression for Euler’s
number can be used:
For the second part, since is in the denominator and it
is approaching infinity, the value of the fraction will approach 1.
Putting them all together, including the
constants taken out earlier:
This simplifies to:
Hence the Poisson distribution is denoted
In summary, the Poisson distribution is a
condition of the binomial theorem where the number of trials approaches
infinity and the probability of success approaches 0.
number in probability
What I find very interesting is the fact
that Euler’s number suddenly popped out when deriving for the Poisson
distribution. It is clearly one of the most fascinating, important, and
fundamental constants in mathematics. Unsurprisingly, it has its applications
in probability theory and it relates directly to the Poisson distribution. For a large , the probability of getting no
successful outcomes is approximately.
This expression can actually be proven to
be correct by inputting the parameters in both the Poisson and binomial
distributions. Let us take a look at how that could work with an example:
is playing a random number generator, which has a range of numbers from 1 to 100;
the teacher says that if after 100 tries the number 65 does not appear, the
student can go home. What is the probability that the student can go home?
We have to first find out the probability
of success, which in this case is the number 65 appearing. The probability of
it happening is 1 in 100.
What we want to find is the probability
when there are no successful outcomes, so.
This is the same value of the limit shown
above. Even in other cases wherein , the probability should still be. This is because as and , the value for will be .
We can also try this with the binomial
theorem, where there are no limiting assumptions made.
Which is essentially:
However, the actual probability is , which is a very close approximation to Euler’s number. For even
larger values of, the probability should get close and closer to.
This result of this example justifies the
use of the Poisson distribution as the equation used in substitution to the
binomial theorem when reaches a very large number.
It also signifies just how important Euler’s number is in the limit theorem,
and the credibility of its usage in the Poisson distribution.
of the Poisson distribution to the Binomial distribution
I also graphed these distributions so that
it can be visualized. However, because I did it in my laptop’s default graphing
software, there were some things that I had to change. I was not able to change
the y and x axes’ variables. Due to this, I had to change some variables so
that the equations would be able to be plotted. Because the probability is the
independent variable, P was changed to y. The dependent variable in the
equations is the number of successful Bernoulli trials, so was changed to x.
For the example above:
For the binomial theorem, I had to expand
the first part of the equation because the software did not recognize it.
As seen from the graph above, the graphs
are almost identical. However at closer inspection:
The x-intercepts are the values that were
It can now be clearly seen that there is
still some discrepancy between the two.
variance, and mode