ASKSAGE: Sage Q&A Forum - RSS feedhttps://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Tue, 09 May 2017 14:12:14 +0200Caching with @parallelhttps://ask.sagemath.org/question/37497/caching-with-parallel/So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:
@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
***NOTE*** The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something
We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.
I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?
Any help on this would be beneficial. Thanks.Wed, 03 May 2017 16:02:30 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/Comment by aram.dermenjian for <p>So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:</p>
<pre><code>@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
</code></pre>
<p><strong><em>NOTE</em></strong> The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something</p>
<p>We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.</p>
<p>I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?</p>
<p>Any help on this would be beneficial. Thanks.</p>
https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37519#post-id-37519Is there somewhere that shows how to do task queueing in python/sage? It might be something that could potentially help although, as you mentioned, I'd need to look at the overhead and bottlenecks to see if there's a way around it. (Do you know anywhere that mentions the lag/timing that would be requiring for doing a more complex forking system that would use multiple cores?)
Thanks btw.Fri, 05 May 2017 15:19:10 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37519#post-id-37519Comment by nbruin for <p>So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:</p>
<pre><code>@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
</code></pre>
<p><strong><em>NOTE</em></strong> The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something</p>
<p>We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.</p>
<p>I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?</p>
<p>Any help on this would be beneficial. Thanks.</p>
https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37521#post-id-37521https://docs.python.org/2/library/multiprocessing.htmlFri, 05 May 2017 18:16:47 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37521#post-id-37521Comment by aram.dermenjian for <p>So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:</p>
<pre><code>@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
</code></pre>
<p><strong><em>NOTE</em></strong> The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something</p>
<p>We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.</p>
<p>I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?</p>
<p>Any help on this would be beneficial. Thanks.</p>
https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37529#post-id-37529Ok cool. Thanks =) So then we should leave parallel processing for more trivial tasks and if it's more complex use forking (when appropriate). In both cases cacheing is a little wonky so be careful yeah? Thanks for all your help on this.Sat, 06 May 2017 13:22:01 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37529#post-id-37529Comment by nbruin for <p>So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:</p>
<pre><code>@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
</code></pre>
<p><strong><em>NOTE</em></strong> The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something</p>
<p>We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.</p>
<p>I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?</p>
<p>Any help on this would be beneficial. Thanks.</p>
https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37508#post-id-37508The "parallel" decorator is for trivially parallel tasks: the different workers get their data at the start and report back their results, without depending on results of others. As soon as tasks interact (as in your case: some workers might discover someone else already computed what they need) you need shared memory (with locking) or inter process communication. Your description seems to fit a task queue servicing multiple workers, where workers submit additional tasks. However, the overhead and bottlenecks of that might be worse than your problem. It depends.Thu, 04 May 2017 19:38:09 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/?comment=37508#post-id-37508Answer by Sébastien for <p>So I have a function which I want to run in a parallel manner as I'll need to run the function a few million times. The thing is, I know that quite a few times, I'll have duplicate information. So I want to speed up the process even more by skipping the ones I don't need. Here's what I have so far:</p>
<pre><code>@parallel(ncpus=7)
def parallel_function(initial,allElts,e1,e2):
untested = set([initial])
verified = set([])
potentials = getPotentials(e1,e2)
while untested:
curTest = untested.pop()
verified.add(curTest)
for e in allElts:
newElt = someProp(potentials,curTest,e)
if newElt not in verified:
untested.add(newElt)
return verified
def testElts(allElts, potentials):
initial = allElts[0]
returnInfo = {}
for tup in list(parallel_function([(initial,allElts,e1,e2) for e1 in allElts for e2 in allElts)):
returnInfo[tup[0][0][2]] = tup[1]
</code></pre>
<p><strong><em>NOTE</em></strong> The above is an exmple. I've tried to dumb it down to try and give a better feel of the functions I'm working with. So consider the above more "pseudo" code than real code in case I missed something</p>
<p>We're assuming that allElts has at least 1 million elements, so this function will take a while. Notice that the parallel_function is a fairly intense function itself, but given any two element, the "getPotentials" function might return similar results. So basically I want to skip my while loop if I've already tested these "potentials" before.</p>
<p>I've tried looking into how caching works in parallel environments, but there doesn't seem to be too much. Would caching in this case be possible with @cache_function? If not, can I do a dictionary that stores the values and access the information that way? Since parallel processing creates forks I assume that means the dictionary information is not necessarily accessible by everyone? Or is that not accurate?</p>
<p>Any help on this would be beneficial. Thanks.</p>
https://ask.sagemath.org/question/37497/caching-with-parallel/?answer=37551#post-id-37551If you want to enumerate recursively the elements of a given set, you may look at: [sage/sets/recursively_enumerated_set.html](http://doc.sagemath.org/html/en/reference/structure/sage/sets/recursively_enumerated_set.html)
If the underlying graph have some structure (being graded, symmetric or a forest), then some enumeration algorithms can be more efficient (time wise or memory wise). If the graph is a forest, then enumeration and other operation like mapreduce can be done in parallel. See [sage/parallel/map_reduce.html](http://doc.sagemath.org/html/en/reference/parallel/sage/parallel/map_reduce.html)Tue, 09 May 2017 14:12:14 +0200https://ask.sagemath.org/question/37497/caching-with-parallel/?answer=37551#post-id-37551