Divide Combinations(n,k) into multiple parts (n > 1110)
I asked a similar question here: https://ask.sagemath.org/question/453.... I closed it, because the answer matched the question there. However, now I have an extra question related to it:
C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, C_cardinality , N)
Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:
c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...
However, my computer's RAM (memory 128GB) is not enough to store the C
's list. Therefore, I iterate C
and store in a CSV instead:
def storeCSV(data, fo, wtype='a'):
with open(fo, wtype) as wFile:
write_file = csv.writer(wFile)
write_file.writerow([data])
index = 0
count = 0
for c in C:
storeCSV(c, "part_%s.csv" % index)
count += 1
if count == N:
count = 0
index += 1
print index
Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole C
to finish.
You might wonder why I need to divide C
into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:
v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 6) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
# (1, 2, 3)
rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
rel.setdefault(g[1:], []).append(g[0]) # '[2, 3]': 1
Specifically, I got all Combinations of 3 numbers within c0, c1,...,cN
. Then, I filter what I need and store in g3
. For all g3
of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any rel[(X, Y)]
later on.
I wish the problem is described clearly and someone can support. Thank a lot !!!
If you really want to perform such huge computation, I strongly suggest that you first find a more efficient way to do it. Instead of describing your programming issues, could you open a new question where you state what you are trying to achieve?