# Running a for cycle in parallel

I have an algorithm that I need to speed up. I would like to parallelize computations

the main part I need to speed up is the following

  for r2 in Subsets(V2,k-4):
check2(fuse(v,r2),k)


this gives as output three values b1, b2, b3 that are global and for any input check2 can either leave them unchanged or increase them by 1. I tried looking into @parallel but it doens't seem to fit my requirements.

How can I parallelize that (I can use up to 16 cores)?

This is the whole code (minus all the parts that do, I can't publish it all):

P2=[]
def P_2(q,n,S):     %creates the set P2

I2R=[]
I2S=[]
I3=[]
I3R=[]
I6=[]
I8=[]
def IG(k):      %creates a set of indeces vectors

def coll(x,y,z):    %checks if 3 given tuples are collinear
A=[x,y,z]
M = matrix(A)
if M.determinant()==0:
return 'true'

def cone(v):     %defines a function to be used in conic()

def conic(x1,x2,x3,x4,x5,x6):    %checks if 6 given tuples are on a conic
A=[cone(x1),cone(x2),cone(x3),cone(x4),cone(x5),cone(x6)]
M = matrix(A)
if M.determinant()==0:
return 'true'

def cube(v):     %defines a function to be used in SIcubic()

def dxcube(v):     %defines a function to be used in SIcubic()

def dycube(v):         %defines a function to be used in SIcubic()

def dzcube(v):       %defines a function to be used in SIcubic()

def SIcubic(x1,x2,x3,x4,x5,x6,x7,x8):      %checks if 8 given tuples are on a special cubic
v1='false'
v=[x1,x2,x3,x4,x5,x6,x7,x8]
for a in range(8):
x=v[a]
A=[]
A.append(dxcube(x))
A.append(dycube(x))
A.append(dzcube(x))
for t in range(8):
if t!=a:
y=v[t]
A.append(cube(y))
M = matrix(A)
if M.determinant()==0:
v1='true'
return v1

b1=0
b2=0
b3=0
def check(s,k):       %checks a vector s of lenght k
global b1
global b2
global b3
IG(k)
q1=0
q2=0
q3=0
for i3 in I3:
a0=i3
a1=i3
a2=i3
if coll(s[a0],s[a1],s[a2])=='true':
q1=1
q2=1
q3=1
break
if q2==0:
if k>5:
if q==0:
for i6 in I6:
a0=i6
a1=i6
a2=i6
a3=i6
a4=i6
a5=i6
if conic(s[a0],s[a1],s[a2],s[a3],s[a4],s[a5])=='true':
q2=1
q3=1
break
if q3==0:
if k>7:
for i8 in I8:
a0=i8
a1=i8
a2=i8
a3=i8
a4=i8
a5=i8
a6=i8
a7=i8
if SIcubic(s[a0],s[a1],s[a2],s[a3],s[a4],s[a5],s[a6],s[a7])=='true':
q3=1
break
b1=b1+q1
b2=b2+q2
b3=b3+q3
return b1
return b2
return b3

b1=0
def check1(s,a):     %checks a vector s and an element a
global b1
q1=0
for i2 in I2S:
a0=i2
a1=i2
if coll(s[a0],s[a1],a)=='true':
q1=1
break
b1=b1+q1
return b1

b1=0
b2=0
b3=0
def check2(s,k):    %checks a vector s of lenght k
global b1
global b2
global b3
q1=0
q2=0
q3=0
for i3 in I3R:
a0=i3
a1=i3
a2=i3
if coll(s[a0],s[a1],s[a2])=='true':
q1=1
q2=1
q3=1
break
if q2==0:
if k>5:
for i6 in I6:
a0=i6
a1=i6
a2=i6
a3=i6
a4=i6
a5=i6
if conic(s[a0],s[a1],s[a2],s[a3],s[a4],s[a5])=='true':
q2=1
q3=1
break
if q3==0:
if k>7:
for i8 in I8:
a0=i8
a1=i8
a2=i8
a3=i8
a4=i8
a5=i8
a6=i8
a7=i8
if SIcubic(s[a0],s[a1],s[a2],s[a3],s[a4],s[a5],s[a6],s[a7])=='true':
q3=1
break
b1=b1+q1
b2=b2+q2
b3=b3+q3
return b1
return b2
return b3

t=1
def TOT(P,q,n,k):      %computes a polynomial depending on q,n,k,P

def fuse(s,t):        %fuses two vectors into 1 (see repsonse to answere)

sn=[]
def Count(S,q,n,k):     %main function
IG(k)
global b1
global b2
global b3
l1=0
l2=0
l3=0
y=q^n
s=len(S)
v=[(0,0,1),(0,1,0),(1,0,0),(1,1,1)]
V1=copy(S)
V1.remove((1,0,0))
V1.remove((0,1,0))
V1.remove((0,0,1))
V1.remove((1,1,1))
if k<4:
b1=0
for s in Subsets(S,k):
check(s,k)
l1=b1*factorial(k)
if k>4:
V2=copy(V1)
for r1 in V1:
b1=0
b2=0
b3=0
check1(v,r1)
if b1==1:
V2.remove(r1)
s2=len(V2)
l1=l1+binomial(s-4,k-4)-binomial(s2,k-4)
l2=l2+binomial(s-4,k-4)-binomial(s2,k-4)
l3=l3+binomial(s-4,k-4)-binomial(s2,k-4)
b1=0
b2=0
b3=0
for r2 in Subsets(V2,k-4):
check2(fuse(v,r2),k)
l1=l1+b1
l2=l2+b2
l3=l3+b3
l1=l1*(y^2+y+1)*(y^3-y)*(y^3-y^2)*factorial(k-4)
l2=l2*(y^2+y+1)*(y^3-y)*(y^3-y^2)*factorial(k-4)
l3=l3*(y^2+y+1)*(y^3-y)*(y^3-y^2)*factorial(k-4)
g1=t-l1
g2=t-l2
g3=t-l3
print('There are ', g1/factorial(k) , ' unordered', k,'-tuples in general linear position')
print('There are ', g1 , ' ordered', k ,'-tuples in general linear position')
print('There are ', g2/factorial(k) , ' unordered', k,'-tuples in general linear and conic position')
print('There are ', g2 , ' ordered', k ,'-tuples in general linear and conic position')
print('There are ', g3/factorial(k) , ' unordered', k,'-tuples in general position')
print('There are ', g3 , ' ordered', k ,'-tuples in general position')

def Active(q,n,k):        %every fucntion I need to run reduced into 1
P2=[]
P_2(q,n,P2)
TOT(P2,q,n,k)
Count(P2,q,n,k)

edit retag close merge delete

1

Why do you think that parallelization will provide a speed up? Writing parallel code is extremy delicate unless it is embarassingly parallel (that is independent programs being run in parallel). If you want help to speed up your code, provide the full code of your program.

Because all the operations in this key passage are independent (I need to check if any given element has a property), therefore I can do several of them at the same time

Sort by » oldest newest most voted

It would have been better if you specified how you used @parallel and what do you exactly mean when you say " it doesn't seem to fit my requirements", but, in any case, I will try to give you a few hints that maybe will help you with your problem. The first thing that I need to be sure that you understand is that, when you use @parallel, your code does not just "run in parallel". Indeed, what happens is that you are executing a new process (with a new python interpreter), you told this process "hey, please run this function with v, r2 and k as arguments" and then, at the end, you retrieve the results. This means, for example, that this new process can not modify the global objects of the previous one. They simply do not share their variables. Therefore, my best advice would be to return three flags from the method check2: b1_must_be_increased, b2_must_be_increased and b3_must_be_increased. Then, in the original process, you proceed to increase the global variables accordingly to the flags you received.

Moreover, you must pay attention to what function you are parallelizing. Let me explain: if you parallelize only check2, for example, then fuse will be executed on the original process and maybe (I don't know if this is that case) it takes more time to execute fuse than to execute check2. In this case, your parallelization will not work (or, actually, will work very very poorly).

So, let me show you a possible parallelization of your code:

# I have no idea of what fuse and check2 do, so I use
# these two placeholders
def fuse(a, b):
return b

def check2(a, b):
b1_must_be_increased = True
b2_must_be_increased = False
b3_must_be_increased = True

flags = (
b1_must_be_increased,
b2_must_be_increased,
b3_must_be_increased
)
return flags

# This is the function that I want to parallelize
@parallel()
def my_parallel_function(v, r2, k):
return check2(fuse(v, r2), k)

# Now I prepare in advance the list of all the arguments
# that I want to pass to my_parallel_function
my_parallel_function_args = [
(v, r2, k) for r2 in Subsets(V2, k - 4)
]

# And here you will find the results of your computation executed
# in parallel.
results = my_parallel_function(my_parallel_function_args)

# Results is a list of couples; the first element of the couple is
# the input of the my_parallel_function, the second element is the
# output. Now I want to take only the output (so, the three flags)
flag_list = [r for r in results]

for flags in flag_list:
if flags:
b1 += 1
if flags:
b2 += 1
if flags:
b3 += 1

print((b1, b2, b3))


Finally, there is one last issue that I want to point out. The function "check2" must be "enough complicated" if you want all this to make sense. Let me explain with an example. Let us suppose that you have 1 million sets and you want to check, for each set, if 1 is inside the set or not. It may seems like a good idea to run this code in parallel. Therefore, what you need to do is to create a few new processes and describe the sets to all the different processes and then retrieve all the results. Unfortunately, this could turn out to be more complicated than actually check if 1 is inside each set or not. So, in principle, your code is parallel but the amount of work that the original process has to perform is increased (and, therefore, your code will become slower than before).

more

I can give you both a quick explanation of the two functions and the whole code:

QUICK: -let v=(1,2,3) and w=(a,b), then fuse(v,w)=(1,2,3,a,b) -check2(tuple) takes a tuple and checks the determinanat of several matrices computed on this vector.

LONG: whole code in modified question

I said that the function is not fit for my case becasue the set I need to check is very big, like 100 billion vectors of lenght 12.

My idea was to look into something like "break the set in 8 and then put together the results of the 8 subprocesses"