The Poisson distribution as an approximation of thebinomial distribution IntroductionWhen I was reading up on radioactive decay,I found out that the probability of decay could be expressed in terms of adiscrete probability distribution.

I had to look it up further because I hadjust recently learned about probability distributions in math. What I found outwas intriguing; the number of atoms decaying in a specific interval of time isactually a discrete variable. It also adheres to Poisson’s postulates, talkedabout later, so it can be expressed in terms of Poisson distribution. However, when what I noticed was that thatsome research papers and online resources would use Poisson distribution, andsome would use the binomial distribution. This intrigued me because if onecould be used over another, then that means the two should be synonymous innature. This goes against my previous knowledge on the two because I thought thatsince they both have different formulas, they should be two very differentconcepts. After consultation with my physics teacher,I was told that probability distributions would come up in higher educationengineering courses. Being an aspiring engineer, I had to explore the differencesbetween binomial and Poisson distribution and understand why they can be usedside by side.

PoissondistributionThe Poisson distribution was firstintroduced in the 18th century by mathematician Siméon Denis Poissonas a way to research on the number of wrong court decisions over a period oftime. In a more general sense, this distribution tells us the probability of the sum of successful independent Bernoulli trialsgiven a fixed interval of time. A Bernoulli trial is a statisticalexperiment where there are only two possible outcomes, either success orfailure. Graphically, it provides a discrete probability distribution functionwhere each point in the y-axis gives the numerical probability of a discrete randomvariable X. It is typically denotedas:  DiscreteprobabilityThe Poisson distribution works withdiscrete variables, so it is worth discussing what it is first.

Discrete variables:·     Have a finite set of data·     Are obtained by counting (andis countable)·     Are non-mutually exclusive·     Have a complete range ofnumbers·     Are represented by distinct,isolated points in a graph One of the simplest statistical examples ofthe discrete variable is the probability of getting  number of heads when a coinis tossed  times. Suppose we have a faircoin, the probability of getting number of heads when the coinis tossed 2 times is:    Explanation Probability (fraction) Probability (decimal) 0 TT 0.25 1 HT, TH 0.50 2 HH 0.25  Because of the nature of the discretevariable, its probability distribution can often just be expressed in a tabularform. Poisson’spostulatesThere are several assumptions to be madewhen using the Poisson distribution. When these assumptions are met, then thePoisson discrete probability distribution can be used. 1.

The probability of success isthe same throughout the whole experiment2.    The probabilities areindependent of one another3.    The probability of a successhappening over a small time period is essentially the value of that time period4.    The probability of more thanone success in a small time period is essentially 0 5.    The rate of success is onlydependent on the length of the interval of time6.    The experiment at hand is apart of a Bernoulli trial  One thing worth adding when comparing theprobabilities of a random variable in real life and in calculations is that ifthere is a large discrepancy between the two sets of data, there must be an externalfactor coming into play when the statistical experiment was done. Poissondistribution as a derivation from binomial distributionThe binomialdistribution has a more specific definition, which is the probability of having  successful outcomes out of Bernoulli trials.

For the probability thata discrete random variable happens times, it is denoted as:  , What I found out after my explorationamazed me; the Poisson distribution is actually just a derivation from thebinomial theorem for when and . It is a mathematical limit of the binomial distribution. ThePoisson distribution thus has to be derived from:  To come up with a correct derivation, thecalculations must adhere to Poisson’s postulates.

Because the probability ofsuccess is identical throughout the whole experiment, this implies that in  number of trials, theexpected value  is , which is also the definition of mean ().  Rearranging it: Substituting  into the equation:   Now the first two terms can be taken outand manipulated. It can be rewritten as: Because both the numerator and denominatornow have  they can be cancelled out,leaving the following:  From here, it can be noted that both thenumerator and denominator have  number of terms. However, as  reaches infinity, the valueof this whole fraction approaches 1. Therefore, it can be said that the valueof the first two terms is 1.

The last term can be divided into twoparts: For the first part, an expression for Euler’snumber can be used:     For the second part, since is in the denominator and itis approaching infinity, the value of the fraction will approach 1.   Putting them all together, including theconstants taken out earlier:   This simplifies to: Hence the Poisson distribution is denotedas: In summary, the Poisson distribution is acondition of the binomial theorem where the number of trials approachesinfinity and the probability of success approaches 0. Euler’snumber in probabilityWhat I find very interesting is the factthat Euler’s number suddenly popped out when deriving for the Poissondistribution. It is clearly one of the most fascinating, important, andfundamental constants in mathematics. Unsurprisingly, it has its applicationsin probability theory and it relates directly to the Poisson distribution. For a large , the probability of getting nosuccessful outcomes is approximately. This expression can actually be proven tobe correct by inputting the parameters in both the Poisson and binomialdistributions.

Let us take a look at how that could work with an example: A studentis playing a random number generator, which has a range of numbers from 1 to 100;the teacher says that if after 100 tries the number 65 does not appear, thestudent can go home. What is the probability that the student can go home? We have to first find out the probabilityof success, which in this case is the number 65 appearing. The probability ofit happening is 1 in 100.So now,   What we want to find is the probabilitywhen there are no successful outcomes, so. This is the same value of the limit shownabove. Even in other cases wherein , the probability should still be.

This is because as  and , the value for will be . We can also try this with the binomialtheorem, where there are no limiting assumptions made. Which is essentially: However, the actual probability is , which is a very close approximation to Euler’s number. For evenlarger values of, the probability should get close and closer to. This result of this example justifies theuse of the Poisson distribution as the equation used in substitution to thebinomial theorem when reaches a very large number.It also signifies just how important Euler’s number is in the limit theorem,and the credibility of its usage in the Poisson distribution.

Approximationof the Poisson distribution to the Binomial distributionI also graphed these distributions so thatit can be visualized. However, because I did it in my laptop’s default graphingsoftware, there were some things that I had to change. I was not able to changethe y and x axes’ variables. Due to this, I had to change some variables sothat the equations would be able to be plotted.

Because the probability is theindependent variable, P was changed to y. The dependent variable in theequations is the number of successful Bernoulli trials, so  was changed to x. For the example above:     Binomial: red Poisson: blue        Poisson:  For the binomial theorem, I had to expandthe first part of the equation because the software did not recognize it.

Binomial: As seen from the graph above, the graphsare almost identical. However at closer inspection:   The x-intercepts are the values that wereobtained earlierIt can now be clearly seen that there isstill some discrepancy between the two.  Mean,variance, and mode