Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

calling a parallel decorated function on an iterator

Say I've decorated a function f with @parallel and then, instead of evaluating f at a list of inputs, I evaluate f at a generator (whose list of yields is meant to be interpreted as the list of inputs at which I want to evaluate f). Silly example:

def my_iterator(n):
    for i in xrange(n):
        yield i

@parallel(verbose=True)
def f(x):
    return x^2

for x in f(my_iterator(2^5)):
    print x

This works---Sage interprets the input "my_iterator(2^5)" to f correctly. However, replacing 2^5 with 2^30 shows that the way Sage goes about trying to compute this is by first building a list of the yields of my_iterator(2^30) and then trying to distribute the elements of that list to forked subprocesses. That is,

for x in f(my_iterator(2^30)):
    print x

is functionally identical to

for x in f([x for x in my_iterator(2^30)]):
    print x

which is horrible. Instead of starting to yield outputs immediately and using virtually no memory, Sage consumes all available memory (as it tries to build a list of length 2^30) and then the computation just dies. Even worse, when it dies it stops silently with no output, despite the "verbose=True" option.

When setting up lengthy parallelized computations, sometimes it makes sense to create the inputs to a parallelized function using a generator (either algorithmically or by, say, reading an input file), and it would be nice if we could start sending those inputs out via the @parallel mechanism for processing as soon as they're generated, instead of trying to create and store every input that will be evaluated in memory before the first forked subprocess begins. I guess I would want the interior generator ("my_iterator(30)" in the above example) to yield its next result to be sent to a forked subprocess whenever a core is available for that forked subprocess, and the parallel computation will run until the interior generator is out of things to yield and all subprocesses have returned.

So my question. Is there a simple workaround, modification of the source code, or alternative method for achieving this?