Ask Your Question

Making code faster by splitting and use of supercomputer

asked 2022-09-01 17:15:41 +0100

sgmth gravatar image

updated 2022-09-01 18:22:57 +0100

Please refer to my question at the following link:

There John Palmieri has written a very good code. As the output is huge and hence time-taking(No output even for 4th order matrices with 4 summands even after many days), so I tried to split the code in parts according to the logic given in the following link:

and hence have this code:

def sum_list(length, total):

    - length -- how many terms
    - total -- desired total

    Return list of lists, each of which has ``length`` entries chosen
    from {0, 1, -1} and adds up to ``total``.
    C = IntegerListsLex(length=length, n=length+total, min_part=0, max_part=2)
    return [[a-1 for a in L] for L in C]
def mat_list_NEW(mat, length):

    - mat -- matrix, assumed to have entries in {1, -1}
    - length -- how many terms

    Return list of lists, each of which has ``length`` entries and
    adds up to ``mat``, and each entry is either a `(0, 1)` or a `(0,
    -1)` matrix.
    sum_plus = sum_list(length, 1)
    sum_minus = sum_list(length, -1)
    old_attempts = []
    for x in mat.list():
        if x == 1:
             sums = sum_plus
        elif x == -1:
             sums = sum_minus
             raise ValueError('each entry of matrix should be 1 or -1')
        if not old_attempts: # initial step
             new_attempts = set(tuple([(s,) for s in S]) for S in sums)
            new_attempts = set()
            for mats in old_attempts:
                    NEW = set()
                    for S in sums:
                            new_M = tuple(sorted(M + (s,) for (M, s) in zip(mats, S)))
                             # keep only entries that correspond to (0,1) or (0,-1) matrices
                            if all(max(entries) - min(entries) < 2 for entries in new_M):

        old_attempts = new_attempts



    R = range(0,P , N)

    c0 = LOA[R[0]:R[0]+N]
    #c1 = LOA[R[1]:R[1]+N]
    #c2 = LOA[R[2]:R[2]+N]
    #c3 = LOA[R[3]:R[3]+N]
    #c4 = LOA[R[4]:R[4]+N]
    #c5 = LOA[R[5]:R[5]+N]

    matrices = [[matrix(mat.base_ring(), mat.nrows(), mat.ncols(), entries) for entries in L] for L in c0]
    return (matrices)    

mat=matrix(3,3, [1, 1, 1, -1, 1, 1, -1, -1, 1])#Enter your matrix
mat_list_NEW(mat,4)#Specify in how many matrices you want to break it up

In the above link it mentions storing of c0,c1,c2,….in csv file because of memory limitations. I have no idea of csv files. So thought of writing each of ci’s individually (as I have written above in the code) and run codes separately. Kindly guide me how to use csv files to further help in running this code.

Note that in the above code for 3rd order matrix with 4 summands P is 73261 and for 5 summands P is 11740316. And we need for higher order matrices(upto say 30) and summands(upto say10) for which P will be very very huge. Somebody told me that if code is written in parallel, then it will run faster.

Also, I will be using supercomputing facility with the following details:

              OS: CentOS 7 Hardware Specifications
              Type I: Total no. of compute nodes: 420
                         CPU only nodes: 259
                         CPU Architecture : Haswell
                         2x Intel Xeon E5-2680 v3/12-Core(24 cores each)
                         RAM: 64 GB
                         GPU accelerated nodes: 161
                         (CPU config same as above + GPU cards)
                         GPU Architecture : 2x NVIDIA K40 (each)
                         (12GB, 2880 CUDA cores)
                         High Memory Nodes : 512 GB RAM 
                         12 CPU,  8 GPU nodes

             TYPE II:Total no. of compute nodes: 184
                         CPU only nodes: 144
                         CPU Architecture : Skylake
                         2x Intel Xeon G-6148/20 cores(40 cores per node)
                         RAM: 96 GB
                         GPU accelerated nodes: 40 (CPU config same )
                         GPU Architecture : NVIDIA V100
                         (32GB,5120 CUDA cores)
                         1 Nvidia V100 card each GPU node: 17 
                         2 Nvidia V100 cards each GPU node:23
                         High Memory Nodes :  192 GB RAM
                         8 CPU, 40 GPU nodes

Requesting for resources will be clear with following examples:

 1.Request for 2 chunks with 20 cpus, 2 gpu cards for 2 hrs by mentioning as follows:
                    " -lselect=2: ncpus=20: ngpus=2 : -lwalltime=02:00:00"
 2. Request for 2 chunks  i.e. full skylake nodes, 1gpu card for 3 hours by mentioning as follows:
                    " -lselect=2: ncpus=40: ngpus=1: centos=skylake: -l walltime=03:00:00"

Kindly guide me how to choose the resources viz. lselect, ncpus, ngpus for my code(Please note that the time limit is 168 hours per code)

So kindly guide with respect to the above mentioned points. Thanks.

edit retag flag offensive close merge delete



I am not an expert in it, but Sage provides a framework for parallel computing:

John Palmieri gravatar imageJohn Palmieri ( 2022-09-01 18:09:29 +0100 )edit

It is very well possible that higher sizes will render the problem of generating all such solutions impractical even at a supercomputer. On the other hand, the structure of solutions here is relatively simple, and it may be possible to generate (some) solutions on demand or analyze them without generating all solutions. This brings the question - what is the context of this problem and what the solutions are used for? Do you really need them all generated at once?

Max Alekseyev gravatar imageMax Alekseyev ( 2022-09-01 19:01:50 +0100 )edit

Basically , I require those combinations which in addition to the conditions mentioned in link also satisfy that each matrix product AiAj is a linear combination of Ai's. Broken the code into two-1st part is the one in my link(now will be using the above code) whose output be used in 2nd part of the code which checks this last condition. As to generating some solutions on demand let me mention that in the link you may refer to my code in which I have specified the no. of non-negative/ non-positive summands , in which we can further specify no. of symmetric/non-symmetric matrices. No idea how to do in above code. As to your suggestion of analyzing without generating, please throw some light. No, I don't need to generate all at once, hence the splitting done in above code.

sgmth gravatar imagesgmth ( 2022-09-01 21:03:00 +0100 )edit

As I understand, the condition "that each matrix product AiAj is a linear combination of Ai's" is essential here and initially ignoring it results in combinatorial explosion. Rather than ignoring it and checking at later stages, it'd be more efficient to find a way generating only those solutions that satisfy this additional condition right away. I will take a look into this problem. Do you have any restriction on the coefficients of the linear combination - e.g., do they have to be integer or rational?

Max Alekseyev gravatar imageMax Alekseyev ( 2022-09-01 23:16:01 +0100 )edit

Yes, checking the last condition alongside should really help. Well,regarding the coefficients, they have to be integers. One more thing I want to say that in my question in the link I had not made it clear that the summand matrices have to be unique and that the matrices with all same entries (that is all zeroes, all ones and all minus ones matrices)are also not allowed.Actually in my code(in the link) these were satisfied so perhaps it slipped from my mind. Later when I pointed out, John Palmieri suggested some changes but it actually didn't work out(Please see the posts in the link). If such combinations are also removed then obviously it will save some more time. Thanks for looking into my problem.

sgmth gravatar imagesgmth ( 2022-09-02 03:00:56 +0100 )edit

1 Answer

Sort by » oldest newest most voted

answered 2022-09-02 06:38:28 +0100

Max Alekseyev gravatar image

As I explained in the comments, it's unwise to ignore the condition that "each matrix product $A_iA_j$ is a linear combination of $A_i$'s with integer coefficients". If this condition is employed at earlier stages, a supercomputer may not be needed, and if it's ignored until later stages, a supercomputer may not be able to help.

In particular, this condition means that the lattice spanned by matrices $A_i$ must contain all powers of the given matrix $M$. It also follows that the number of matrices should be at least the degree of the minimal polynomial of $M$. In the given example, we have

M = matrix(3,3, [1, 1, 1, -1, 1, 1, -1, -1, 1])
k = M.minpoly().degree()

which gives $k=3$. So, the powers of $M$ span a subspace $V:=\langle M^0, M^1, M^2\rangle$. Since the span $\langle A_1, A_2, A_3, A_4\rangle$ contains $V$, we have all $A_i\in \langle M^0, M^1, M^2, A_1\rangle$. So, one can try to go over all 01-matrices $A_1$, and then find all 01-matrices in $\langle M^0, M^1, M^2, A_1\rangle$ to restrict the set of candidates for $A_2, A_3, A_4$.

edit flag offensive delete link more


Okay....I was not aware of the concepts you have mentioned. Thanks. I need to understand those concepts and proceed as you have suggested. Thanks again.

sgmth gravatar imagesgmth ( 2022-09-03 02:26:55 +0100 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2022-09-01 17:15:41 +0100

Seen: 259 times

Last updated: Sep 02 '22