Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Divide Combinations(n,k) into multiple parts (n > 1110)

I asked a similar question here: https://ask.sagemath.org/question/45347/sagemath-heap-size-limit/. I closed it, because the answer matched the question there. However, now I have an extra question related to it:

C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, L, N)

Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:

c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...

However, my computer's RAM (memory 128GB) is not enough to store the C's list. Therefore, I iterate C and store in a CSV instead:

def storeCSV(data, fo, wtype='a'):
    with open(fo, wtype) as wFile:
        write_file = csv.writer(wFile)
        write_file.writerow([data])
index = 0
count = 0
for c in C:
    storeCSV(c, "part_%s.csv" % index)
    count += 1
    if count == N:
        count = 0
        index += 1
        print index

Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole C to finish.

You might wonder why I need to divide C into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:

v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 4) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
    # (1, 2, 3)
    rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
    rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
    rel.setdefault(g[1:], []).append(g[0])  # '[2, 3]': 1

Specifically, I got all Combinations of 3 numbers within c0, c1,...,cN. Then, I filter what I need and store in g3. For all g3 of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any rel[(X, Y)] later on.

I wish the problem is described clearly and someone can support. Thank a lot !!!

Divide Combinations(n,k) into multiple parts (n > 1110)

I asked a similar question here: https://ask.sagemath.org/question/45347/sagemath-heap-size-limit/. I closed it, because the answer matched the question there. However, now I have an extra question related to it:

C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, L, N)

Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:

c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...

However, my computer's RAM (memory 128GB) is not enough to store the C's list. Therefore, I iterate C and store in a CSV instead:

def storeCSV(data, fo, wtype='a'):
    with open(fo, wtype) as wFile:
        write_file = csv.writer(wFile)
        write_file.writerow([data])
index = 0
count = 0
for c in C:
    storeCSV(c, "part_%s.csv" % index)
    count += 1
    if count == N:
        count = 0
        index += 1
        print index

Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole C to finish.

You might wonder why I need to divide C into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:

v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 4) 6) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
    # (1, 2, 3)
    rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
    rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
    rel.setdefault(g[1:], []).append(g[0])  # '[2, 3]': 1

Specifically, I got all Combinations of 3 numbers within c0, c1,...,cN. Then, I filter what I need and store in g3. For all g3 of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any rel[(X, Y)] later on.

I wish the problem is described clearly and someone can support. Thank a lot !!!

Divide Combinations(n,k) into multiple parts (n > 1110)

I asked a similar question here: https://ask.sagemath.org/question/45347/sagemath-heap-size-limit/. I closed it, because the answer matched the question there. However, now I have an extra question related to it:

C = Combinations(2110, 3)
C_cardinality = binomial(2110, 3) # 1.563.429.820
N = C_cardinality / 95 # Number of parts = 95 (chunk size)
R = range(0, L, C_cardinality , N)

Because C_cardinality is too big, I would like to divide C into multiple parts, which are then used on multiple computers to process separately. In order to be sure that these multiple parts are totally different but covering the whole C, I use C's list:

c0 = C.list()[R[0]:R[0]+N]
c1 = C.list()[R[1]:R[1]+N]
c2 = C.list()[R[2]:R[2]+N]
...

However, my computer's RAM (memory 128GB) is not enough to store the C's list. Therefore, I iterate C and store in a CSV instead:

def storeCSV(data, fo, wtype='a'):
    with open(fo, wtype) as wFile:
        write_file = csv.writer(wFile)
        write_file.writerow([data])
index = 0
count = 0
for c in C:
    storeCSV(c, "part_%s.csv" % index)
    count += 1
    if count == N:
        count = 0
        index += 1
        print index

Unfortunately, this CSV write is too slow, i.e ~6000 write/min, which means I have to wait about 3800 hours for the whole C to finish.

You might wonder why I need to divide C into multiple parts. After having these parts, I will run the following code on different computers with a purpose to collect information among set of 3 numbers:

v = MatrixSpace(GF(2), 3,9)
g3 = (c for c in c0 if block_matrix(3, 1, [v[c[0]], v[c[1]], v[c[2]]]).rank() >= 6) # or c1, c2, ..., cN
rel = dict()
for g in list(map(tuple,g3)):
    # (1, 2, 3)
    rel.setdefault(g[::2], []).append(g[1]) # '[1, 3]': 2
    rel.setdefault(g[:2], []).append(g[-1]) # '[1, 2]': 3
    rel.setdefault(g[1:], []).append(g[0])  # '[2, 3]': 1

Specifically, I got all Combinations of 3 numbers within c0, c1,...,cN. Then, I filter what I need and store in g3. For all g3 of format (A, B, C), I collect (A,B), (B,C), (A,C) as keys and C,A,B as values respectively. The most important reason I must collect this in a Python dictionary is that I need to retrieve any rel[(X, Y)] later on.

I wish the problem is described clearly and someone can support. Thank a lot !!!