Ask Your Question

Revision history [back]

As a complement to @tmonteil's answer, some hints about speed.

To gain speed, you can

  • avoid Sage's preparser transforming Python ints into Sage Integers
  • avoid conversions and coercions that occur when you add, multiply, compare numbers of different types
  • avoid creating lists you don't need; use iterators instead
  • use methods instead of functions, eg for log and sqrt
  • square instead of taking square root
  • set apart some cases where you can save yourself some computation and comparison.

Here are four variants of a function inspired by your code, with 'trials' and 'length' as arguments.

import numpy

def compute_a(trials, length):
    mean = 0
    results = [0]*trials
    data = [0]*length
    for i in range(trials):
        data[0] = numpy.random.normal(0,1)
        for j in range(1, length):
            data[j] = data[j-1] + numpy.random.normal(0,1)
        for k in range(2, length):
            if data[k] > 2*sqrt(k*log(log(k+1))):
                results[i] += 1
        mean += results[i]
        return mean

def compute_b(trials, length):
    mean = 0
    results = [0]*trials
    data = [0]*length
    for i in range(trials):
        data[0] = numpy.random.normal(0,1)
        for j in range(1, length):
            data[j] = data[j-1] + numpy.random.normal(0,1)
        for k in range(2, length):
            if data[k] > 2*sqrt(k*log(log(k+1.))):
                results[i] += 1
        mean += results[i]
        return mean

def compute_c(trials, length):
    two = RDF(2)
    mean = 0
    results = [0]*trials
    data = [0]*length
    for i in xrange(trials):
        data[0] = numpy.random.normal(0,1)
        for j in range(1, length):
            data[j] = data[j-1] + numpy.random.normal(0,1)
        for k in range(2, length):
            if RDF(data[k]) > two*(RDF(k)*RDF(k+1).log().log()).sqrt():
                results[i] += 1
        mean += results[i]
        return mean

def compute_d(trials, length):
    zero = RDF.zero()
    four = RDF(4r)
    mean = 0r
    results = [0r]*trials
    data = [0r]*length
    for i in xrange(trials):
        data[0r] = numpy.random.normal(0r,1r)
        for j in range(1r, length):
            data[j] = data[j-1r] + numpy.random.normal(0r,1r)
        for k in range(2r, length):
            a = RDF(data[k])
            if a > zero and a*a > four*RDF(k)*RDF(k+1r).log().log():
                results[i] += 1r
        mean += results[i]
        return mean
  • compute_a is just making your code into a function.
  • compute_buses @tmonteil's trick of k+1. so that comparisons are done in RR rather than SR.
  • compute_c makes sure that all multiplications, logs, sqrts, comparisons are in RDF; we also use xrange instead of range to avoid creating a list we don't need.
  • compute_d tries to avoid Sage's preparser, and to avoid square roots; we also avoid a lot of computation when data[k] < 0, because we don't even have to compute the right-hand side in that case.

For (trials, length) = (2000, 2000), I get timings of roughly: (a) 4 s, (b) 40 ms, (c) 5 ms, (d) 2 ms.

sage: timeit('compute_a(2000,2000)')
5 loops, best of 3: 3.99 s per loop
sage: timeit('compute_b(2000,2000)')
5 loops, best of 3: 41.4 ms per loop
sage: timeit('compute_c(2000,2000)')
125 loops, best of 3: 4.89 ms per loop
sage: timeit('compute_d(2000,2000)')
125 loops, best of 3: 1.92 ms per loop