Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Making code faster by splitting and use of supercomputer

Please refer to my question at the following link:

https://ask.sagemath.org/question/63305/writing-a-matrix-as-a-sum-of-matrices/

There John Palmieri has written a very good code. As the output is huge and hence time-taking(No output even for 4th order matrices with 4 summands even after many days), so I tried to split the code in parts according to the logic given in the following link:

https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/

and hence have this code:

def sum_list(length, total):
    """
    INPUT:

    - length -- how many terms
    - total -- desired total

    Return list of lists, each of which has ``length`` entries chosen
    from {0, 1, -1} and adds up to ``total``.
    """
    C = IntegerListsLex(length=length, n=length+total, min_part=0, max_part=2)
    return [[a-1 for a in L] for L in C]
def mat_list_NEW(mat, length):
    """
    INPUT:

    - mat -- matrix, assumed to have entries in {1, -1}
    - length -- how many terms

    Return list of lists, each of which has ``length`` entries and
    adds up to ``mat``, and each entry is either a `(0, 1)` or a `(0,
    -1)` matrix.
    """
    sum_plus = sum_list(length, 1)
    sum_minus = sum_list(length, -1)
    old_attempts = []
    for x in mat.list():
        if x == 1:
             sums = sum_plus
        elif x == -1:
             sums = sum_minus
        else:
             raise ValueError('each entry of matrix should be 1 or -1')
        if not old_attempts: # initial step
             new_attempts = set(tuple([(s,) for s in S]) for S in sums)
        else:
            new_attempts = set()
            for mats in old_attempts:
                    NEW = set()
                    for S in sums:
                            new_M = tuple(sorted(M + (s,) for (M, s) in zip(mats, S)))
                             # keep only entries that correspond to (0,1) or (0,-1) matrices
                            if all(max(entries) - min(entries) < 2 for entries in new_M):

                                NEW.add(new_M)
                    new_attempts.update(NEW)
        old_attempts = new_attempts
    LOA=list(old_attempts)

    P=len(LOA)

    N=floor(P/5)

    R = range(0,P , N)

    c0 = LOA[R[0]:R[0]+N]
    #c1 = LOA[R[1]:R[1]+N]
    #c2 = LOA[R[2]:R[2]+N]
    #c3 = LOA[R[3]:R[3]+N]
    #c4 = LOA[R[4]:R[4]+N]
    #c5 = LOA[R[5]:R[5]+N]

    matrices = [[matrix(mat.base_ring(), mat.nrows(), mat.ncols(), entries) for entries in L] for L in c0]
    return (matrices)    

mat=matrix(3,3, [1, 1, 1, -1, 1, 1, -1, -1, 1])#Enter your matrix
mat_list_NEW(mat,4)#Specify in how many matrices you want to break it up

In the above link it mentions storing of c0,c1,c2,….in csv file because of memory limitations. I have no idea of csv files. So thought of writing each of ci’s individually (as I have written above in the code) and run codes separately. Kindly guide me how to use csv files to further help in running this code.

Note that in the above code for 3rd order matrix with 4 summands P is 73261 and for 5 summands P is 11740316. And we need for higher order matrices(upto say 30) and summands(upto say10) for which P will be very very huge. Somebody told me that if code is written in parallel, then it will run faster.

Also, I will be using supercomputing facility with the following details:

              OS: CentOS 7 Hardware Specifications
              Type I: Total no. of compute nodes: 420
                         CPU only nodes: 259
                         CPU Architecture : Haswell
                         2x Intel Xeon E5-2680 v3/12-Core(24 cores each)
                         RAM: 64 GB
                         -----------------------------------
                         GPU accelerated nodes: 161
                         (CPU config same as above + GPU cards)
                         GPU Architecture : 2x NVIDIA K40 (each)
                         (12GB, 2880 CUDA cores)
                         -----------------------------------
                         High Memory Nodes : 512 GB RAM 
                         12 CPU,  8 GPU nodes

             TYPE II:Total no. of compute nodes: 184
                         CPU only nodes: 144
                         CPU Architecture : Skylake
                         2x Intel Xeon G-6148/20 cores(40 cores per node)
                         RAM: 96 GB
                         --------------------------------------------
                         GPU accelerated nodes: 40 (CPU config same )
                         GPU Architecture : NVIDIA V100
                         (32GB,5120 CUDA cores)
                         1 Nvidia V100 card each GPU node: 17 
                         2 Nvidia V100 cards each GPU node:23
                         High Memory Nodes :  192 GB RAM
                         8 CPU, 40 GPU nodes

Requesting for resources will be clear with following examples:

  1.Request for 2 chunks with 20 cpus, 2 gpu cards for 2 hrs by mentioning as follows:
                             " -lselect=2: ncpus=20: ngpus=2 : -lwalltime=02:00:00"
  2. Request for 2 chunks  i.e. full skylake nodes, 1gpu card for 3 hours by mentioning as follows:
                            " -lselect=2: ncpus=40: ngpus=1: centos=skylake: -l walltime=03:00:00"

Kindly guide me how to choose the resources viz. lselect, ncpus, ngpus for my code.

So kindly guide with respect to the above mentioned points. Thanks.

Making code faster by splitting and use of supercomputer

Please refer to my question at the following link:

https://ask.sagemath.org/question/63305/writing-a-matrix-as-a-sum-of-matrices/

There John Palmieri has written a very good code. As the output is huge and hence time-taking(No output even for 4th order matrices with 4 summands even after many days), so I tried to split the code in parts according to the logic given in the following link:

https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/

and hence have this code:

def sum_list(length, total):
    """
    INPUT:

    - length -- how many terms
    - total -- desired total

    Return list of lists, each of which has ``length`` entries chosen
    from {0, 1, -1} and adds up to ``total``.
    """
    C = IntegerListsLex(length=length, n=length+total, min_part=0, max_part=2)
    return [[a-1 for a in L] for L in C]
def mat_list_NEW(mat, length):
    """
    INPUT:

    - mat -- matrix, assumed to have entries in {1, -1}
    - length -- how many terms

    Return list of lists, each of which has ``length`` entries and
    adds up to ``mat``, and each entry is either a `(0, 1)` or a `(0,
    -1)` matrix.
    """
    sum_plus = sum_list(length, 1)
    sum_minus = sum_list(length, -1)
    old_attempts = []
    for x in mat.list():
        if x == 1:
             sums = sum_plus
        elif x == -1:
             sums = sum_minus
        else:
             raise ValueError('each entry of matrix should be 1 or -1')
        if not old_attempts: # initial step
             new_attempts = set(tuple([(s,) for s in S]) for S in sums)
        else:
            new_attempts = set()
            for mats in old_attempts:
                    NEW = set()
                    for S in sums:
                            new_M = tuple(sorted(M + (s,) for (M, s) in zip(mats, S)))
                             # keep only entries that correspond to (0,1) or (0,-1) matrices
                            if all(max(entries) - min(entries) < 2 for entries in new_M):

                                NEW.add(new_M)
                    new_attempts.update(NEW)
        old_attempts = new_attempts
    LOA=list(old_attempts)

    P=len(LOA)

    N=floor(P/5)

    R = range(0,P , N)

    c0 = LOA[R[0]:R[0]+N]
    #c1 = LOA[R[1]:R[1]+N]
    #c2 = LOA[R[2]:R[2]+N]
    #c3 = LOA[R[3]:R[3]+N]
    #c4 = LOA[R[4]:R[4]+N]
    #c5 = LOA[R[5]:R[5]+N]

    matrices = [[matrix(mat.base_ring(), mat.nrows(), mat.ncols(), entries) for entries in L] for L in c0]
    return (matrices)    

mat=matrix(3,3, [1, 1, 1, -1, 1, 1, -1, -1, 1])#Enter your matrix
mat_list_NEW(mat,4)#Specify in how many matrices you want to break it up

In the above link it mentions storing of c0,c1,c2,….in csv file because of memory limitations. I have no idea of csv files. So thought of writing each of ci’s individually (as I have written above in the code) and run codes separately. Kindly guide me how to use csv files to further help in running this code.

Note that in the above code for 3rd order matrix with 4 summands P is 73261 and for 5 summands P is 11740316. And we need for higher order matrices(upto say 30) and summands(upto say10) for which P will be very very huge. Somebody told me that if code is written in parallel, then it will run faster.

Also, I will be using supercomputing facility with the following details:

              OS: CentOS 7 Hardware Specifications
              Type I: Total no. of compute nodes: 420
                         CPU only nodes: 259
                         CPU Architecture : Haswell
                         2x Intel Xeon E5-2680 v3/12-Core(24 cores each)
                         RAM: 64 GB
                         -----------------------------------
                         GPU accelerated nodes: 161
                         (CPU config same as above + GPU cards)
                         GPU Architecture : 2x NVIDIA K40 (each)
                         (12GB, 2880 CUDA cores)
                         -----------------------------------
                         High Memory Nodes : 512 GB RAM 
                         12 CPU,  8 GPU nodes

             TYPE II:Total no. of compute nodes: 184
                         CPU only nodes: 144
                         CPU Architecture : Skylake
                         2x Intel Xeon G-6148/20 cores(40 cores per node)
                         RAM: 96 GB
                         --------------------------------------------
                         GPU accelerated nodes: 40 (CPU config same )
                         GPU Architecture : NVIDIA V100
                         (32GB,5120 CUDA cores)
                         1 Nvidia V100 card each GPU node: 17 
                         2 Nvidia V100 cards each GPU node:23
                         High Memory Nodes :  192 GB RAM
                         8 CPU, 40 GPU nodes

Requesting for resources will be clear with following examples:

  1.Request for 2 chunks with 20 cpus, 2 gpu cards for 2 hrs by mentioning as follows:
                             " -lselect=2: ncpus=20: ngpus=2 : -lwalltime=02:00:00"
  2. Request for 2 chunks  i.e. full skylake nodes, 1gpu card for 3 hours by mentioning as follows:
                            " -lselect=2: ncpus=40: ngpus=1: centos=skylake: -l walltime=03:00:00"

Kindly guide me how to choose the resources viz. lselect, ncpus, ngpus for my code.code(Please note that the time limit is 168 hours per code)

So kindly guide with respect to the above mentioned points. Thanks.

Making code faster by splitting and use of supercomputer

Please refer to my question at the following link:

https://ask.sagemath.org/question/63305/writing-a-matrix-as-a-sum-of-matrices/

There John Palmieri has written a very good code. As the output is huge and hence time-taking(No output even for 4th order matrices with 4 summands even after many days), so I tried to split the code in parts according to the logic given in the following link:

https://ask.sagemath.org/question/45606/divide-combinationsnk-into-multiple-parts-n-1110/

and hence have this code:

def sum_list(length, total):
    """
    INPUT:

    - length -- how many terms
    - total -- desired total

    Return list of lists, each of which has ``length`` entries chosen
    from {0, 1, -1} and adds up to ``total``.
    """
    C = IntegerListsLex(length=length, n=length+total, min_part=0, max_part=2)
    return [[a-1 for a in L] for L in C]
def mat_list_NEW(mat, length):
    """
    INPUT:

    - mat -- matrix, assumed to have entries in {1, -1}
    - length -- how many terms

    Return list of lists, each of which has ``length`` entries and
    adds up to ``mat``, and each entry is either a `(0, 1)` or a `(0,
    -1)` matrix.
    """
    sum_plus = sum_list(length, 1)
    sum_minus = sum_list(length, -1)
    old_attempts = []
    for x in mat.list():
        if x == 1:
             sums = sum_plus
        elif x == -1:
             sums = sum_minus
        else:
             raise ValueError('each entry of matrix should be 1 or -1')
        if not old_attempts: # initial step
             new_attempts = set(tuple([(s,) for s in S]) for S in sums)
        else:
            new_attempts = set()
            for mats in old_attempts:
                    NEW = set()
                    for S in sums:
                            new_M = tuple(sorted(M + (s,) for (M, s) in zip(mats, S)))
                             # keep only entries that correspond to (0,1) or (0,-1) matrices
                            if all(max(entries) - min(entries) < 2 for entries in new_M):

                                NEW.add(new_M)
                    new_attempts.update(NEW)
        old_attempts = new_attempts
    LOA=list(old_attempts)

    P=len(LOA)

    N=floor(P/5)

    R = range(0,P , N)

    c0 = LOA[R[0]:R[0]+N]
    #c1 = LOA[R[1]:R[1]+N]
    #c2 = LOA[R[2]:R[2]+N]
    #c3 = LOA[R[3]:R[3]+N]
    #c4 = LOA[R[4]:R[4]+N]
    #c5 = LOA[R[5]:R[5]+N]

    matrices = [[matrix(mat.base_ring(), mat.nrows(), mat.ncols(), entries) for entries in L] for L in c0]
    return (matrices)    

mat=matrix(3,3, [1, 1, 1, -1, 1, 1, -1, -1, 1])#Enter your matrix
mat_list_NEW(mat,4)#Specify in how many matrices you want to break it up

In the above link it mentions storing of c0,c1,c2,….in csv file because of memory limitations. I have no idea of csv files. So thought of writing each of ci’s individually (as I have written above in the code) and run codes separately. Kindly guide me how to use csv files to further help in running this code.

Note that in the above code for 3rd order matrix with 4 summands P is 73261 and for 5 summands P is 11740316. And we need for higher order matrices(upto say 30) and summands(upto say10) for which P will be very very huge. Somebody told me that if code is written in parallel, then it will run faster.

Also, I will be using supercomputing facility with the following details:

              OS: CentOS 7 Hardware Specifications
              Type I: Total no. of compute nodes: 420
                         CPU only nodes: 259
                         CPU Architecture : Haswell
                         2x Intel Xeon E5-2680 v3/12-Core(24 cores each)
                         RAM: 64 GB
                         -----------------------------------
                         GPU accelerated nodes: 161
                         (CPU config same as above + GPU cards)
                         GPU Architecture : 2x NVIDIA K40 (each)
                         (12GB, 2880 CUDA cores)
                         -----------------------------------
                         High Memory Nodes : 512 GB RAM 
                         12 CPU,  8 GPU nodes

             TYPE II:Total no. of compute nodes: 184
                         CPU only nodes: 144
                         CPU Architecture : Skylake
                         2x Intel Xeon G-6148/20 cores(40 cores per node)
                         RAM: 96 GB
                         --------------------------------------------
                         GPU accelerated nodes: 40 (CPU config same )
                         GPU Architecture : NVIDIA V100
                         (32GB,5120 CUDA cores)
                         1 Nvidia V100 card each GPU node: 17 
                         2 Nvidia V100 cards each GPU node:23
                         High Memory Nodes :  192 GB RAM
                         8 CPU, 40 GPU nodes

Requesting for resources will be clear with following examples:

  1.Request for 2 chunks with 20 cpus, 2 gpu cards for 2 hrs by mentioning as follows:
                     " -lselect=2: ncpus=20: ngpus=2 : -lwalltime=02:00:00"
  2. Request for 2 chunks  i.e. full skylake nodes, 1gpu card for 3 hours by mentioning as follows:
                     " -lselect=2: ncpus=40: ngpus=1: centos=skylake: -l walltime=03:00:00"

Kindly guide me how to choose the resources viz. lselect, ncpus, ngpus for my code(Please note that the time limit is 168 hours per code)

So kindly guide with respect to the above mentioned points. Thanks.