How to implement a parallelization?
Consider a code of the kind tree exploration amassing fruits: when the procedure arrives to a new node of the tree, if it is a leaf a possible fruit is collected, else a computation is done to determine the children of the node. I would like to parallelize as follows: the children are allocated to all the available CPUs, each CPU has a queue and a given child is allocated to the CPU with the smallest queue.
It seems to be a generic way to parallelize such a tree exploration.
Question: How to implement such a parallelization?
In addition, how to use GPU (for HPC)?
The code has the following form:
cpdef function(list L1, list L2):
cdef int i,n #...
cdef list LL1,LL2 #...
#...
# core of the code
#...
n= #...
for i in range(n):
LL1= #...
LL2= #...
function(LL1,LL2)