Revision history [back]

As a complement to @tmonteil's answer, some hints about speed.

To gain speed, you can

• avoid Sage's preparser transforming Python ints into Sage Integers
• avoid conversions and coercions that occur when you add, multiply, compare numbers of different types
• avoid creating lists you don't need; use iterators instead
• use methods instead of functions, eg for log and sqrt
• square instead of taking square root
• set apart some cases where you can save yourself some computation and comparison.

Here are four variants of a function inspired by your code, with 'trials' and 'length' as arguments.

import numpy

def compute_a(trials, length):
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1))):
results[i] += 1
mean += results[i]
return mean

def compute_b(trials, length):
mean = 0
results = [0]*trials
data = [0]*length
for i in range(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if data[k] > 2*sqrt(k*log(log(k+1.))):
results[i] += 1
mean += results[i]
return mean

def compute_c(trials, length):
two = RDF(2)
mean = 0
results = [0]*trials
data = [0]*length
for i in xrange(trials):
data[0] = numpy.random.normal(0,1)
for j in range(1, length):
data[j] = data[j-1] + numpy.random.normal(0,1)
for k in range(2, length):
if RDF(data[k]) > two*(RDF(k)*RDF(k+1).log().log()).sqrt():
results[i] += 1
mean += results[i]
return mean

def compute_d(trials, length):
zero = RDF.zero()
four = RDF(4r)
mean = 0r
results = [0r]*trials
data = [0r]*length
for i in xrange(trials):
data[0r] = numpy.random.normal(0r,1r)
for j in range(1r, length):
data[j] = data[j-1r] + numpy.random.normal(0r,1r)
for k in range(2r, length):
a = RDF(data[k])
if a > zero and a*a > four*RDF(k)*RDF(k+1r).log().log():
results[i] += 1r
mean += results[i]
return mean

• compute_a is just making your code into a function.
• compute_buses @tmonteil's trick of k+1. so that comparisons are done in RR rather than SR.
• compute_c makes sure that all multiplications, logs, sqrts, comparisons are in RDF; we also use xrange instead of range to avoid creating a list we don't need.
• compute_d tries to avoid Sage's preparser, and to avoid square roots; we also avoid a lot of computation when data[k] < 0, because we don't even have to compute the right-hand side in that case.

For (trials, length) = (2000, 2000), I get timings of roughly: (a) 4 s, (b) 40 ms, (c) 5 ms, (d) 2 ms.

sage: timeit('compute_a(2000,2000)')
5 loops, best of 3: 3.99 s per loop
sage: timeit('compute_b(2000,2000)')
5 loops, best of 3: 41.4 ms per loop
sage: timeit('compute_c(2000,2000)')
125 loops, best of 3: 4.89 ms per loop
sage: timeit('compute_d(2000,2000)')
125 loops, best of 3: 1.92 ms per loop