Central limit rule-of-thumb for Binomial approximation

Hi MIT proba OpenCourseWare reading-questions-6b first: I think maybe there is a mistake in the answer given for the problem 4: Exact: 0.0227, rule-of-thumb: 0.025 ? second : can someone tell me what is wrong in my raisonning , and how calculate the rule-of-thumb ? third: why I get an error if I uncomment the last line below ?

forget()
#############################
import matplotlib.pyplot as plt
def makeFig(pdfTable,cdfTable,textFig):
# Make a figure
fig = plt.figure()
# Make room for legend at bottom
# The axes for your lists 1-2
# Plot lines 1-3
line1 = ax1.plot([0] + pdfTable,'bo-',label='pdf')
line2 = ax1.plot([0] + cdfTable,'go-',label='cdf')
# To get lines 1-2 on the same legend, we need to
# gather all the lines together before calling legend
lines = line1+line2
labels = [l.get_label() for l in lines]
# giving loc a tuple in axes-coords. ncol=2 for 2 columns
ax1.legend(lines, labels, loc=(0,-0.4), ncol=2)
ax1.set_xlabel('--------'+ textFig)
# Display the figure
plt.show()
return
######################

n=64
kMax=40
p=0.5
kTable=[i for i in range(0,kMax+1)]
pdfTable=[]
cdfTable=[]
#A fair coin is tossed 64 times. Using Binomial Distribution to estimate the probability of getting more that 40 heads.
for k in kTable:
pdfTable.append((binomial(n,k)*p^k *(1-p)^(n-k)))
cdfTable.append(sum(pdfTable))
#pdfTableR = [ round(elem, 5) for elem in pdfTable ]
#show(pdfTableR)
cdfTableR = [ round(elem, 5) for elem in cdfTable ]
show(cdfTableR)
makeFig(pdfTable,cdfTable,'                           P more than 40 Heads On 64 toss')
pSupKmax=1-cdfTable[-1]
show('P(X > 40 Heads) On 64 toss  ')
show('Using  Binomial distribution : ',pSupKmax)

#A fair coin is tossed 64 times. Use the central limit theorem to estimate the probability of getting more that 40 heads.
################################
# for one toss
mu_1=p # Expectation for one Bernouilli
sigma_1=sqrt(p*(1-p)) # Standard deviation for one Bernouilli
# for 64 toss, sum of 64 bernoulli variables
mu_64=n*mu_1 # expectation= sum of the 64 Bernoulli Expectation
sigma_64=sqrt(n)*sigma_1 # Standard Deviation= sum of the 64 Bernoulli standard Deviation
# rule-of-thumb
show('Using  1-erf :', 1-erf( ((kMax-mu_64)/sigma_64) ) )

#############################################################
#using Central Limit Theorem using normal law integration
#############################################################

#from sage.symbolic.integration.integral import indefinite_integral
from sage.symbolic.integration.integral import definite_integral
x = var('x')
assume(x, 'real')

normalPDF = function('normalPDF')(x)
normalCDF = function('normalCDF')(x)
# normal PDF(x,mu_0,sigma_0)
normalPDF=1/(2 *sqrt(pi*sigma_64^2))*e^(-(((x-mu_64)/sigma_64 )^2)/2)
normalCDFnum=definite_integral(normalPDF,x,-infinity,kMax)
show('Using  normal law integration 1-normalCDFnum :',(1-normalCDFnum ))
# error if line below uncommented
#show('Using  normal law integration 1-normalCDFnum :',(1-normalCDFnum ).n())

edit retag close merge delete

Ooops the third question is closed !, I made a mistake using sigma_0 :-( it must be :

normalPDF=1/(2 sqrt(pisigma_64^2)) * e^(-(((x-mu_64)/sigma_64 )^2)/2)

not :

normalPDF=1/(2 sqrt(pisigma_0^2)) * e^(-(((x-mu_64)/sigma_64 )^2)/2)

i have now edited the code. sorry

it remains the first and second question, but it is pure mathematical knowledge now. and maybe it is not the place to post it. Maybe I should delete this post and ask on math forum now ?

( 2018-01-28 10:45:46 -0600 )edit

Sort by ยป oldest newest most voted

First: The probability for $X>40$, where $X$ is binomially distributed as in the problem, is given by

p = sum( [ binomial(64,k) for k in [41..64] ] ) / 2^64


which is exactly:

sage: p
302210786238405829/18446744073709551616
sage: p.n()
0.0163828795494116


which is also the number from the posted code:

sage: pSupKmax
0.0163828795494116


It is also excluded, that the "more than $40$ heads" in the text translates as $X\ge 40$, since the "other number" is

sage: ( sum( [ binomial(64,k) for k in [40..64] ] ) / 2^64 ).n()
0.0299705947834996


Second: I do not know which is the "rule of thumb", so i try to guess. The random variable $X$ is binomially distributed, its mean is $64\cdot \frac 12 = 32$, its variance is $64\cdot \frac 12\cdot\left(1- \frac 12\right) = 16=4^2$, so the variable that can be approximated with a standard normally distributed variable $Y$ is $(X-32)/4$, $$Y\approx \frac 14(X-32)\ .$$ Let $\Phi$ be the distribution of $Y$. Then in the statistics one "knows" the following values for $\Phi$, corresponding to "confidence intervals" of often usage: 68-95-99 rule \begin{align} \mathbb P(\ |X-32| \le 1\cdot 4\ )& \approx \mathbb P( |Y| <1) \approx 0.6827\ ,\\\\ \mathbb P(\ |X-32| \le 2\cdot 4\ )& \approx \mathbb P( |Y| <2 ) \approx 0.9545\ ,\\\\ \mathbb P(\ |X-32| \le 3\cdot 4\ )& \approx \mathbb P( |Y| <3) \approx 0.9974\ .\\\\ \end{align} In our case, the problem wants "by coincidence" the number that we can easily extract from the second line. We have quickly $$\mathbb P(Y>2) = \frac 12(\mathbb P(Y<-2)+\mathbb P(Y>2)) =\frac 12(1-\mathbb P(|Y|<2))\approx \frac 12 \cdot 0.0455 =0.02275\ .$$ This last value correpsonds well to the value 1-Phi(2) = 0.0227501319482 in the following sage dialog:

sage: T = RealDistribution( 'gaussian', 1 )
sage: for k in [0..3]:
....:     print "Phi(%s) = %s" % ( k, T.cum_distribution_function(k) )
....:
Phi(0) = 0.5
Phi(1) = 0.841344746069
Phi(2) = 0.977249868052
Phi(3) = 0.998650101968
sage: for k in [0..3]:
....:     print "1-Phi(%s) = %s" % ( k, 1-T.cum_distribution_function(k) )
....:
1-Phi(0) = 0.5
1-Phi(1) = 0.158655253931
1-Phi(2) = 0.0227501319482
1-Phi(3) = 0.00134989803163


Note: In statistics, there is one more trick used when approximating some "binomial $X$" by some "standard normal $Y$" as in our case. The random variable $X$ is discrete, so "more than $40$" is both $X>40$ and $X\ge 41$. When we translate this in terms of $Y$ it is better to translate using the middle point $40.5$. And we get:

sage: 1 - T.cum_distribution_function( (40.5-32)/4 )
0.016793306448448786


which fits nicely in the "reality", which is 0.0163828795494116. This important experience can be done only by using something like sage, so they should introduce it in the school, since is the only free "thing" doing the job (without pain)... !

Third Sorry, i could not reproduce the error, at any rate, the following works for me...

sage: show( 'Using  normal law integration 1-normalCDFnum :'
....:       , 1 - T.cum_distribution_function( (40.5-32)/4 ) )
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|Using|\phantom{\verb!xx!}\verb|normal|\phantom{\verb!x!}\verb|law|\phantom{\verb!x!}\verb|integration|\phantom{\verb!x!}\verb|1-normalCDFnum|\phantom{\verb!x!}\verb|:| 0.0167933064484

sage: show( 'Using  normal law integration 1-normalCDFnum :'
....:       , ( 1 - T.cum_distribution_function( (40.5-32)/4 ) ).n() )
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|Using|\phantom{\verb!xx!}\verb|normal|\phantom{\verb!x!}\verb|law|\phantom{\verb!x!}\verb|integration|\phantom{\verb!x!}\verb|1-normalCDFnum|\phantom{\verb!x!}\verb|:| 0.0167933064484488

more

Thanks a lot (again !) Dan_fulea.

( 2018-01-28 11:42:43 -0600 )edit

moreover there was an other mistake in the normal formula I wrote the 2 was a the wrong place (was outside the square !). it should be like below and, then I would have found the value asked.

from sage.symbolic.integration.integral import definite_integral
x = var('x')
assume(x, 'real')
normalPDF = function('normalPDF')(x)
normalCDF = function('normalCDF')(x)
normalPDF=1/(sqrt(2*pi*sigma_64^2)) * e^(-(((x-mu_64)/sigma_64 )^2)/2)
normalCDFnum=definite_integral(normalPDF,x,-infinity,kMax)
show('Using  normal law integration 1-normalCDFnum :',(1-normalCDFnum ))
show('Using  normal law integration 1-normalCDFnum :',(1-normalCDFnum ).n())

( 2018-01-29 06:44:26 -0600 )edit

but the calculation expected in this question was as Dan_fulea said:

we know P(Z< 1sigma)=0.84 , P(Z< 2sigma)=0.977 and P(Z< 3sigma)=0.997 P(S>Kmax)=P( (S-mu_64)/sigma_64 > (Kmax-mu_64)/sigma_64) => P(Z> (40-32)/4) => P(Z > 2)= 1-0.9772=0.0230 P(Z > 2 sigma)= 1-0.9772=0.0230 (sigma=1 for normalized Gaussian distribution)

( 2018-01-29 06:53:09 -0600 )edit