ASKSAGE: Sage Q&A Forum - Individual question feedhttps://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Sun, 03 Mar 2019 06:53:31 -0600Divide Combinations(n,k) into multiple parts (n > 1110)https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/I asked a similar question here: https://ask.sagemath.org/question/45347/sagemath-heap-size-limit/. I closed it, because the answer matched the question there. However, now I have an extra question related to it:
C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, C_cardinality , N)
Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:
c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...
However, my computer's RAM (memory 128GB) is not enough to store the `C`'s list. Therefore, I iterate `C` and store in a CSV instead:
def storeCSV(data, fo, wtype='a'):
with open(fo, wtype) as wFile:
write_file = csv.writer(wFile)
write_file.writerow([data])
index = 0
count = 0
for c in C:
storeCSV(c, "part_%s.csv" % index)
count += 1
if count == N:
count = 0
index += 1
print index
Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole `C` to finish.
You might wonder why I need to divide `C` into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:
v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 6) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
# (1, 2, 3)
rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
rel.setdefault(g[1:], []).append(g[0]) # '[2, 3]': 1
Specifically, I got all Combinations of 3 numbers within `c0, c1,...,cN`. Then, I filter what I need and store in `g3`. For all `g3` of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any `rel[(X, Y)]` later on.
I wish the problem is described clearly and someone can support. Thank a lot !!!Sat, 02 Mar 2019 18:25:24 -0600https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/Comment by vdelecroix for <p>I asked a similar question here: <a href="https://ask.sagemath.org/question/45347/sagemath-heap-size-limit/">https://ask.sagemath.org/question/453...</a>. I closed it, because the answer matched the question there. However, now I have an extra question related to it: </p>
<pre><code>C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, C_cardinality , N)
</code></pre>
<p>Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:</p>
<pre><code>c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...
</code></pre>
<p>However, my computer's RAM (memory 128GB) is not enough to store the <code>C</code>'s list. Therefore, I iterate <code>C</code> and store in a CSV instead:</p>
<pre><code>def storeCSV(data, fo, wtype='a'):
with open(fo, wtype) as wFile:
write_file = csv.writer(wFile)
write_file.writerow([data])
index = 0
count = 0
for c in C:
storeCSV(c, "part_%s.csv" % index)
count += 1
if count == N:
count = 0
index += 1
print index
</code></pre>
<p>Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole <code>C</code> to finish.</p>
<p>You might wonder why I need to divide <code>C</code> into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:</p>
<pre><code>v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 6) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
# (1, 2, 3)
rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
rel.setdefault(g[1:], []).append(g[0]) # '[2, 3]': 1
</code></pre>
<p>Specifically, I got all Combinations of 3 numbers within <code>c0, c1,...,cN</code>. Then, I filter what I need and store in <code>g3</code>. For all <code>g3</code> of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any <code>rel[(X, Y)]</code> later on.</p>
<p>I wish the problem is described clearly and someone can support. Thank a lot !!!</p>
https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/?comment=45616#post-id-45616If you really want to perform such huge computation, I strongly suggest that you first find a more efficient way to do it. Instead of describing your programming issues, could you open a new question where you state what you are trying to achieve?Sun, 03 Mar 2019 06:53:31 -0600https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/?comment=45616#post-id-45616