From the Binomial to the Normal Distribution Leslie Wright The purpose of this worksheet is to demonstrate how the binomial distribution becomes the normal distribution when the sample size approaches infinity. restart; This is the standard expression for the r-th binomial probability after the np-th (or middle) term for the expansion of NiMpLCYlInBHIiIiJSJxR0YmJSJuRw==, where NiMvLCYlInBHIiIiJSJxR0YmRiY=: y:=unapply(convert(binomial(n,n*p+r)*p^(n*p+r)*q^(n*q-r),factorial),r); This constitutes the ratio between two adjacent probabilities: rat:=simplify(y(r+1)/y(r)); The following is the numerator of this ratio after dividing through by NiMqKCUibkciIiIlInBHRiUlInFHRiU=: numer(rat)/n/p/q; algsubs(p+q=1,%); top:=expand(%); Ditto for the denominator of that same ratio: bot:=expand(denom(rat)/n/p/q); The following estimates the difference between the logarithms of the numerator and denominator, capitalizing on the Maclaurin series approximation NiMvLSUjbG5HNiMsJiIiIkYoJSJ4R0YoRik= when x is small. Since we will eventually assume n to be big, the approximation is certainly appropriate: logtop:=top-1; logbot:=bot-1; This line of course is based on the relation that the logarithm of a quotient is equal to the logarithm of the dividend less the logarithm of the divisor: logtop-logbot; simplify(%); dlny:=algsubs(p+q=1,%)*dr; Lets do a little scaling of the independent variable axis here--i.e., create a new variable NiMvJSJ4RyomJSJyRyIiIi0lJXNxcnRHNiMlIm5HISIi: dlny:=subs({r=sqrt(n)*x,dr=sqrt(n)*dx}, dlny); Another scale substitution, using the standard deviation of the binomial distribution, NiMvJSZzaWdtYUctJSVzcXJ0RzYjKiglIm5HIiIiJSJwR0YqJSJxR0Yq, scaled down by NiMtJSVzcXJ0RzYjJSJuRw== to reflect the above scaling: algsubs(1/q/p=1/sigma^2,%); Here we begin simplify the element in question, and see what happens when n goes to infinity: simplify(%/dx); expand(%); dlny:=limit(%,n=infinity)*dx; Now take the indefinite integral to get an expression for the logarithm of the ordinate in relation to x. The logarithm of A is the constant of integration here: lny:=int(dlny/dx,x)+ln(A); Now take the exponential of both sides and get an expression for y: y:=expand(exp(lny)); We need to find a value of A so that the area under the curve is unity. We need to tell Maple the NiMlJnNpZ21hRw== is a positive number so it can find the definite integral: assume(sigma>0); int(y,x=-infinity..infinity)=1; A:=solve(%,A); Let's set up an expression for what we now have found to be the probability density function of the normal distribution of mean NiMlI211Rw== and standard deviation NiMlJnNpZ21hRw==. Remember that we have been assuming up to now that our distribution has a mean of zero, hence the substitution to account for any mean: f:=unapply(subs(x=X-mu,y),(X,mu,sigma)); Here's is the standard normal curve, with mean zero and standard deviation one: plot(f(x,0,1),x=-3..3); Now let's set up an expression for the cumulative distribution function--i.e., the probability that a normal variate will take on a value of x or less. Maple uses the error function to express this analytically: Q:=unapply((int(f(t,0,1),t=-infinity..x)),x); Now let's plot it to see what it looks like. Very familiar shape indeed: plot(Q(x),x=-3..3); What about the reverse issue, i.e. what is the value of the variate associated with a particular percentile (a.k.a. the one-tailed probability)? We need to solve an equation to find out: Q(z)=.975; fsolve(%,z); What about a two-tailed probability? We need to solve a slightly different equation: (Q(z)-1/2)*2=.95; fsolve(%,z);