ASKSAGE: Sage Q&A Forum - Individual question feedhttp://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Tue, 09 Aug 2016 16:28:50 -0500Memory leak somewhere?http://ask.sagemath.org/question/34364/memory-leak-somewhere/My code is using massive amounts of memory (eventually), which I don't think should be the case. The memory use slowly increases until my computer runs out of memory. I tried enabling garbage collection, but that didn't help (or didn't help enough). I don't see any reason why this should use more and more memory. It takes somewhere between 5 and 10 hours for this program to use up my 16 GB of memory, but that memory user is increasing is clear quickly.
import numpy
trials = 5800
length = 5000
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1))):
results[i] += 1
mean += results[i]Sun, 07 Aug 2016 13:04:22 -0500http://ask.sagemath.org/question/34364/memory-leak-somewhere/Answer by slelievre for <p>My code is using massive amounts of memory (eventually), which I don't think should be the case. The memory use slowly increases until my computer runs out of memory. I tried enabling garbage collection, but that didn't help (or didn't help enough). I don't see any reason why this should use more and more memory. It takes somewhere between 5 and 10 hours for this program to use up my 16 GB of memory, but that memory user is increasing is clear quickly. </p>
<pre><code>import numpy
trials = 5800
length = 5000
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1))):
results[i] += 1
mean += results[i]
</code></pre>
http://ask.sagemath.org/question/34364/memory-leak-somewhere/?answer=34394#post-id-34394As a complement to @tmonteil's answer, some hints about speed.
To gain speed, you can
- avoid Sage's preparser transforming Python ints into Sage Integers
- avoid conversions and coercions that occur when you add, multiply, compare numbers of different types
- avoid creating lists you don't need; use iterators instead
- use methods instead of functions, eg for log and sqrt
- square instead of taking square root
- set apart some cases where you can save yourself some computation and comparison.
Here are four variants of a function inspired by your code, with 'trials' and 'length' as arguments.
import numpy
def compute_a(trials, length):
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1))):
results[i] += 1
mean += results[i]
return mean
def compute_b(trials, length):
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1.))):
results[i] += 1
mean += results[i]
return mean
def compute_c(trials, length):
two = RDF(2)
mean = 0
results = [0]*trials
data = [0]*length
for i in xrange(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if RDF(data[k]) > two*(RDF(k)*RDF(k+1).log().log()).sqrt():
results[i] += 1
mean += results[i]
return mean
def compute_d(trials, length):
zero = RDF.zero()
four = RDF(4r)
mean = 0r
results = [0r]*trials
data = [0r]*length
for i in xrange(trials):
data[0r] = numpy.random.normal(0r,1r)
for j in range(1r, length):
data[j] = data[j-1r] + numpy.random.normal(0r,1r)
for k in range(2r, length):
a = RDF(data[k])
if a > zero and a*a > four*RDF(k)*RDF(k+1r).log().log():
results[i] += 1r
mean += results[i]
return mean
- `compute_a` is just making your code into a function.
- `compute_b`uses @tmonteil's trick of `k+1.` so that comparisons are done in `RR` rather than `SR`.
- `compute_c` makes sure that all multiplications, logs, sqrts, comparisons are in RDF;
we also use xrange instead of range to avoid creating a list we don't need.
- `compute_d` tries to avoid Sage's preparser, and to avoid square roots;
we also avoid a lot of computation when data[k] < 0, because we don't
even have to compute the right-hand side in that case.
For (trials, length) = (2000, 2000), I get timings of roughly: (a) 4 s, (b) 40 ms, (c) 5 ms, (d) 2 ms.
sage: timeit('compute_a(2000,2000)')
5 loops, best of 3: 3.99 s per loop
sage: timeit('compute_b(2000,2000)')
5 loops, best of 3: 41.4 ms per loop
sage: timeit('compute_c(2000,2000)')
125 loops, best of 3: 4.89 ms per loop
sage: timeit('compute_d(2000,2000)')
125 loops, best of 3: 1.92 ms per loop
Tue, 09 Aug 2016 16:28:50 -0500http://ask.sagemath.org/question/34364/memory-leak-somewhere/?answer=34394#post-id-34394Answer by tmonteil for <p>My code is using massive amounts of memory (eventually), which I don't think should be the case. The memory use slowly increases until my computer runs out of memory. I tried enabling garbage collection, but that didn't help (or didn't help enough). I don't see any reason why this should use more and more memory. It takes somewhere between 5 and 10 hours for this program to use up my 16 GB of memory, but that memory user is increasing is clear quickly. </p>
<pre><code>import numpy
trials = 5800
length = 5000
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1))):
results[i] += 1
mean += results[i]
</code></pre>
http://ask.sagemath.org/question/34364/memory-leak-somewhere/?answer=34365#post-id-34365We should inspect further, but there is indeed an issue. Thanks for reporting ! It has to do with the symbolic computations. Indeed, when you write `2*sqrt(k*log(log(k+1)))`, since `k+1` is an integer, the computation is done symbolically, and not numerically, which takes more time. The quick way to avoid this is to add a dot `.` after the `1` to specify that it is a floating-point number and not an integer (anyway, you will compare with a float at the end):
So, just replace :
if data[k] > 2*sqrt(k*log(log(k+1))):
with
if data[k] > 2*sqrt(k*log(log(k+1.))):
What i do not understant though, is that this solution, while much faster than the initial code (and not leaking anymore), is still much slower than running the code from ipython (and adding the line `from math import sqrt, log` at the beginning), while the only remaining difference is about coercion.
Sun, 07 Aug 2016 15:52:44 -0500http://ask.sagemath.org/question/34364/memory-leak-somewhere/?answer=34365#post-id-34365Comment by nbruin for <p>We should inspect further, but there is indeed an issue. Thanks for reporting ! It has to do with the symbolic computations. Indeed, when you write <code>2*sqrt(k*log(log(k+1)))</code>, since <code>k+1</code> is an integer, the computation is done symbolically, and not numerically, which takes more time. The quick way to avoid this is to add a dot <code>.</code> after the <code>1</code> to specify that it is a floating-point number and not an integer (anyway, you will compare with a float at the end):</p>
<p>So, just replace :</p>
<pre><code> if data[k] > 2*sqrt(k*log(log(k+1))):
</code></pre>
<p>with </p>
<pre><code> if data[k] > 2*sqrt(k*log(log(k+1.))):
</code></pre>
<p>What i do not understant though, is that this solution, while much faster than the initial code (and not leaking anymore), is still much slower than running the code from ipython (and adding the line <code>from math import sqrt, log</code> at the beginning), while the only remaining difference is about coercion.</p>
http://ask.sagemath.org/question/34364/memory-leak-somewhere/?comment=34366#post-id-34366sqrt and log in sage are wrappers that need to investigate their arguments to decide what to do. So with numerical arguments they are slower that the math sqrt and log, just for that reason. Furthermore, for numerical input they probably end up using mpfr rather than the math library, so that bit is probably slower too (but perhaps numerically a little more robust, and at least capable of dealing with multiprecision)Sun, 07 Aug 2016 16:17:38 -0500http://ask.sagemath.org/question/34364/memory-leak-somewhere/?comment=34366#post-id-34366