Ask Your Question
1

More efficient methods for finding subsets of Subsets()

asked 2023-02-10 23:48:29 +0100

Eric Snyder gravatar image

updated 2023-02-10 23:48:55 +0100

Calling SS = Subsets([1..100000], 4) takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. %%prun suggests the majority of that time is used by misc.py:556(_stable_uniq)

(Note: trying this with $> \sim 10^5$ elements causes an overflow.)

But getting a subset of that list... hoo boy.

NList = [i for i in Subsets([1..100], 4) if sum(i) == 100] takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by set.py:465(__init__)

Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)

So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000] range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).

Any suggestions welcome!

edit retag flag offensive close merge delete

Comments

Note that :

sage: binomial(10000,4)
416416712497500

i. e.

sage: binomial(10000,4).n()
4.16416712497500e14

about 10000 times less that your result. R agrees :

sage: r.choose(10000,4).sage()
416416712497500

So does combinatorics :

sage: Subsets(10000,4).cardinality()
416416712497500

Something may be amiss in your computation...

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2023-02-11 04:40:37 +0100 )edit

Can you formulate more clearly what problem you are solving?

Max Alekseyev gravatar imageMax Alekseyev ( 2023-02-11 04:53:56 +0100 )edit

@Emmanuel Charpentier: Your calculations are correct, but the initial calculation up there uses $100000$, not $10000$.

@Max Alekseyev: The context is attempting to make this algorithm work. The initial step requires finding all subsets of [1..n] with a given sum, then finding sets of three distinct subsets whose squares have the same sum (which takes a fraction of the time of the first step).

Eric Snyder gravatar imageEric Snyder ( 2023-02-11 09:10:47 +0100 )edit

When using your algorithm, you postulate a=100. Do you know (or postulate) b ?

You do not need to build your sets : building iterators in them should suffice to exhibit a solution. Finding all of them is another question... I doubt that it can be done realistically by brute force...

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2023-02-11 09:50:29 +0100 )edit

2 Answers

Sort by » oldest newest most voted
2

answered 2023-02-11 15:22:03 +0100

Max Alekseyev gravatar image

updated 2023-02-11 15:24:53 +0100

If you need partitions of an integerk into 4 distinct parts, you can rely on Sage Partitions() generator:

for a,b,c,d in Partitions(k, length=4, max_slope=-1):
    ...

If you want additionally to bound the parts, then add max_part=... parameter. See this documentation for all available options.

edit flag offensive delete link more

Comments

I hadn't even thought about just doing it as a partition. Perfect, thanks!

Eric Snyder gravatar imageEric Snyder ( 2023-02-12 08:38:31 +0100 )edit
0

answered 2023-02-11 15:05:30 +0100

Here is an iterator over 4-tuples (a, b, c, d) such that a + b + c + d = k and a < b < c < d. You can also ensure that d <= nmax, which seems a requirement of your problem.

def get_series(k, nmax=None):
    """
    Iterator over all series (a, b, c, d) such that:
    - a + b + c + d = k
    - a < b < c < d

    When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
    """
    if nmax is None:
        for a in range(1, (k - 6) // 4 + 1):
            for b in range(a + 1, (k - a - 3) // 3 + 1):
                for c in range(b + 1, (k - a - b - 1) // 2 + 1):
                    yield (a, b, c, k - a - b - c)
    else:
        for a in range(1, min(nmax, (k - 6) // 4) + 1):
            for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
                for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
                    d = k - a - b - c
                    if d <= nmax:
                        yield (a, b, c, d)

you can check that

This is way faster than your brute force attempt

sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True

This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...

sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777

Another useful idea is to store the tuples in a dictionary with key a^2 + b^2 + c^2 + d^2 and value the list of corresponding tuples (so with same sum and same sum of squares).

edit flag offensive delete link more

Comments

This is reinventing of Partitions() functionality. See my answer below.

Max Alekseyev gravatar imageMax Alekseyev ( 2023-02-11 15:22:39 +0100 )edit

Right, thanks.

David Coudert gravatar imageDavid Coudert ( 2023-02-12 11:12:41 +0100 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2023-02-10 23:48:29 +0100

Seen: 298 times

Last updated: Feb 11 '23