# Help finding expected value of sum of random variables

I'm very much a Sage newbie, and I'm having trouble solving for the expected value of a discrete summation. I'll admit that I'm well removed from statistics, linear algebra, and econometrics, so it might be that what I'm trying to accomplish is illogical.

Consider the following parameters:

E ~ N(0,1) (i.e., E is a random variable distributed standard normal)

M ~ U(1,m) (i.e., M is a uniformly distributed random variable varying between 1 and m)

A = | Σ E×M | over the interval (1,N) (or the absolute value of the summation of E times M over interval 1,N)

I'd like to find the expected value of A as a function of N (or the limit of A as N goes to infinity, assuming A converges to a real number). Can I use Sage to solve for something like this (assuming it's solvable, which I think it is based on some simulation results)?

edit retag close merge delete

What does your summation mean in the definition of A? It is mathematically unclear to me.

( 2016-03-09 09:22:31 -0600 )edit

There are too (two) many appearances of N, while m is used and forgotten. We can also be delighted to see E and M, the only two letters used to denote either expectation or mean, as the names of two random variables. We may use X and V instead. Then the two variables should be independent, else nothing can be computed. The statement should make this clear. Since i am inside a comment, there is a remark that is appropriate.Since everything in probability has to go quick and intuitive, we have a lot of "probability theory without probability spaces". Instead, one has a dictionary of concepts (e.g. density) and a fenomenological way to manipulate them without a solid fundamental shortcut. In my opinion, sage and similar computer algebra systems help to see and use the probability space.

( 2017-03-03 19:00:53 -0600 )edit

Sort by » oldest newest most voted

The question is not well defined. The best way to "do something" is to "guess" the or a related question, and answer this one. (The original question must have been in the same circle of ideas, and should have been touched with similar vehicles.)

Restatement:

Let us fix an integer $K>1$. We consider

• $K$ random variables $Z_1,\dots, Z_K$ which follow the standard normal distribution $N(0,1^2)$,

• and $K$ random variables $V_1,\dots, V_K$ which follow the uniform distribution on the intervals $(1,2),\dots,(1,K+1)$ - respectively.

The family of all these variables should be an independent family of random variables defined on the same probability space. Let $\mathbb{E}$ be the expectation, the mean on this space. We build $X(K)=|Z_1V_1+\dots+Z_KV_K|$ and its expectation $f(K)= \mathbb{E} X(K)=\mathbb{E}\Big[\ |Z_1V_1+\dots+Z_KV_K|\ \Big]$ as a function of $K$. The exercise asks for

• heuristical arguments, that may lead to an asymptotic $F(K)=O(K^?)$ in big-O-notation, and

• a computer simulation that supports the heuristic.

This was the complicated part of the answer. From this point things go straightforward: The random variable under the modulus has mean zero since $\mathbb{E} [Z_jV_j] = \mathbb{E} [Z_j] \mathbb{E} [V_j] = 0\cdot \mathbb{E} [V_j]=0$, and terms have variance

Var$\displaystyle[Z_jV_j] = \mathbb{E} [(Z_jV_j)^2] -\mathbb{E} [Z_jV_j]^2 =\mathbb{E} [Z_j^2] \mathbb{E} [V_j^2] -\mathbb{E} [Z_j]\mathbb{E} [V_j]^2$

$\qquad\displaystyle = 1\cdot \mathbb{E} [V_j^2]-0 =\frac 1{j}\int_1^{j+1}x^2\; dv=\frac 1{3j}((j+1)^3-1^3)$

and so on.

We used independence. Further using the independence, the variance of the sum is the sum of the variances and we compute $\displaystyle\sum_{1\le j\le K}\frac 1{3j}((j+1)^3-1^3)$:

sage: var( 'j,K' );
sage: latex( sum( 1/3/j * ( (j+1)^3-1^3 ), j, 1, K ).factor() )
\frac{1}{9} \, {\left(K^{2} + 6 \, K + 14\right)} K


Then we expect: $\displaystyle \frac{Z_1V_1+\dots+Z_KV_K}{\displaystyle\sqrt{\frac{1}{9} {\left(K^{2} + 6 \, K + 14\right)} K}} \sim N(0,1^2)$ .

(This is the optimistic law of large numbers, applied outside mathematics when we do not have time to check the details.)

For a big $K$ we can optimistically and statistically approximate the RHS with a normally distributed $Y\in N(0,1^2)$.

Then $\mathbb{E}|Y|$ is twice the integral on $[0,\infty)$ from $\frac1 {\sqrt{2\pi}}y\exp(-y^2/2)$.

Putting all together we get: $\displaystyle f(K)\sim \frac 2{3\sqrt {2\pi}} \sqrt{K\left(K^{2} + 6 K + 14\right)}$ .

That's the maths.

Now we simulate and we ask also for the values respecting the guessed asymptotic:

The simulation...

for pow in [ 2,3,4,5 ]:
K = 10 ** pow
SAMPLES = []    # and we append
for experiment in [ 1..99 ]:
SAMPLES . append( abs( sum( [ gauss(0,1) * uniform( 1,k+2 ) for k in range(K) ] ) ) )
print "%s -> %s" % ( K, mean( SAMPLES ) )


We get

100 -> 294.711735785
1000 -> 8714.8222098
10000 -> 249403.620665
100000 -> 8734793.09067


Next time we will see other numbers above.

And the asymptotic:

for pow in [ 2,3,4,5 ]:
K = 10 ** pow
print "%s -> %f" % ( K, 2/3/sqrt(2*pi) * sqrt( K * ( K^2 + 6*K + 14 ) ) )

100 -> 274.004912
1000 -> 8435.694028
10000 -> 266041.315371
100000 -> 8410694.055422


I did not check the details, but we strongly encourage $f(K)\in O(K^{3/2})$, even more, we have

$\displaystyle f(K)\sim\frac 1{\sqrt{2\pi}}\cdot\frac 23\cdot K^{3/2}$.

more