# when/how to use parallel?

So I've scouted around some of the questions posed here that involve the terms parallel or the decorator @parallel and I want to know: is there any documentation out there that introduces novices to parallel computing (in sage)?

At present (i.e. with the parallel? documentation being what it is), I don't really know how (or even when) to take advantage of @parallel.

If I have a dual-core CPU, should I be trying to decorate every function with @parallel? Can someone educate this fool?

Thanks,

Steven

## EDIT:

I suppose the best answer here would give an example of situations where @parallel can be of use, and where it cannot. I'm going to throw two examples out there, someone let me know if this is correct. Feel free to add you own, if my examples are two superficial.

1. Computing arc-length of a (smooth) curve parametrized by $t \in [0,2\pi]$

  var('t')
def calcArcLength(gamma):
speed_gamma = sqrt( sum( [ component.diff(t)^2 for component in gamma ]))
return numerical_integral(speed_gamma, 0, 2*pi)[0]

2. Calculating the arc-lengths of a list of smooth curves

@parallel
def makeList(length=1):
li = []
for i in range(length-1):
li.append(generateRandomCurve())
return li

@parallel
def processList(list_of_curves):
arclength_of_curve = []
for curve in list_of_curves:
arclength_of_curve.append(calcArcLength(curve))
return arclength_of_curve


Is it fair to say that @parallel should speed up the time to make and process list_of_curves? Where as it won't speed up the function numerical_integral? 2pi)[0]2pi)[0]2*pi)[0]

edit retag close merge delete

Sort by » oldest newest most voted

I think you might be misunderstanding what the parallel decorator does to a function (or at least not understanding it in the limited way I do!). As I understand it, the basic functionality of @parallel is to convert a function which takes a single input to one which takes a list of inputs, and runs the original function on each item in the list. The matter is slightly confused because the new "decorated" function has the same name as the original one, so from a user's point of view, it's hard to tell the difference between the two (and of course, that's the point of decorators :). The "parallel" functionality is really just doing each of those separate computations on separate processors.

@parallel doesn't really analyze or optimize the actual workings of your function. Thus, I don't think you need to write the extra "list processing" functions, but just decorate calcArcLength if you want to compute the arc lengths of a bunch of curves in parallel. And I think you are right that it won't speed up something like numerical_integral on just a single computation.

So, in answer to the question in the title, I would say use @parallel when you want to apply the same function to a long list of inputs -- write the function, and let @parallel distribute the separate function calls to separate processes.

I'm not sure if an example is necessary at this point, but I don't think there are enough @parallel examples, so I'll extend the one given by @benjaminfjones:

@parallel
def hard_computation(n,d=1,show_out=False):
for j in range(n):
sleep(.1)
if show_out:
print "factoring",n,"/",d,".."
return str(factor(floor(n/d)))


The function hard_computation just simulates some long calculation. The following is just it's basic functionality, and would work with or without the @parallel decorator:

sage: hard_computation(16,2,True)
factoring 16 / 2 ..
'2^3'

sage: %time hard_computation(12)
'2^2 * 3'
CPU time: 0.00 s,  Wall time: 1.20 s


Note that, even though the for loop could be parallelized, @parallel doesn't speed this up. Here's what @parallel makes possible:

sage: r = hard_computation([2*n for n in range(3,10)]) #this is instantaneous
sage: r
<generator object __call__ at 0x10d559e10>


So @parallel just sets up a generator object. None of the hard_computation code is run until you start getting items from r. Here's what I do:

for x in r:
print x


Which returns:

(((6,), {}), '2 * 3')
(((8,), {}), '2^3')
(((10,), {}), '2 * 5')
(((12,), {}), '2^2 * 3')
(((14,), {}), '2 * 7')
(((16,), {}), '2^4')
(((18,), {}), '2 * 3^2')


In each line of output, x[0] holds the arguments for the computation, and x[1] holds the output. This is important because the order of the output is just the order that the processes finish in, not the order that they're called. For

r = hard_computation([2*n for n in range(3,10)])
for x ...
more

That's great. This should certainly be added to the manual.

( 2011-08-14 17:26:26 -0500 )edit

I see you started #11462 to improve the parallel documentation (in case anyone else here wants to look at the Ticket and contribute).

( 2011-08-14 17:44:13 -0500 )edit

Ha, yeah I forgot to mention that; thanks :)

( 2011-08-15 01:52:06 -0500 )edit

Well, I guess parallel now is something more like an optimization tool, not a thing for regular using in every function. Not every function requaries parallelism, even more - linear algorythms can not be executed in parallel. So, if calculations speed is quite satisfactory for you, then you should not use parallel (if not just for fun).

more

Right. I'm not sure I know what a "linear algorithm" is. Let me edit my question, so I can be more specific.

( 2011-08-14 02:37:33 -0500 )edit