Ask Your Question
1

More efficient methods for finding subsets of Subsets()

asked 2 years ago

Eric Snyder gravatar image

updated 2 years ago

Calling SS = Subsets([1..100000], 4) takes 50 ms to generate a list with about 4×1018 elements. %%prun suggests the majority of that time is used by misc.py:556(_stable_uniq)

(Note: trying this with >∼105 elements causes an overflow.)

But getting a subset of that list... hoo boy.

NList = [i for i in Subsets([1..100], 4) if sum(i) == 100] takes 45 s to generate a list with 5952 elements, out of 3.9×106 possibilities. About half of that time is used by set.py:465(__init__)

Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about O(n4), so going from 100 to 1000 multiplies the time by around 104; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)

So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000] range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).

Any suggestions welcome!

Preview: (hide)

Comments

Note that :

sage: binomial(10000,4)
416416712497500

i. e.

sage: binomial(10000,4).n()
4.16416712497500e14

about 10000 times less that your result. R agrees :

sage: r.choose(10000,4).sage()
416416712497500

So does combinatorics :

sage: Subsets(10000,4).cardinality()
416416712497500

Something may be amiss in your computation...

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2 years ago )

Can you formulate more clearly what problem you are solving?

Max Alekseyev gravatar imageMax Alekseyev ( 2 years ago )

@Emmanuel Charpentier: Your calculations are correct, but the initial calculation up there uses 100000, not 10000.

@Max Alekseyev: The context is attempting to make this algorithm work. The initial step requires finding all subsets of [1..n] with a given sum, then finding sets of three distinct subsets whose squares have the same sum (which takes a fraction of the time of the first step).

Eric Snyder gravatar imageEric Snyder ( 2 years ago )

When using your algorithm, you postulate a=100. Do you know (or postulate) b ?

You do not need to build your sets : building iterators in them should suffice to exhibit a solution. Finding all of them is another question... I doubt that it can be done realistically by brute force...

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2 years ago )

2 Answers

Sort by » oldest newest most voted
2

answered 2 years ago

Max Alekseyev gravatar image

updated 2 years ago

If you need partitions of an integerk into 4 distinct parts, you can rely on Sage Partitions() generator:

for a,b,c,d in Partitions(k, length=4, max_slope=-1):
    ...

If you want additionally to bound the parts, then add max_part=... parameter. See this documentation for all available options.

Preview: (hide)
link

Comments

I hadn't even thought about just doing it as a partition. Perfect, thanks!

Eric Snyder gravatar imageEric Snyder ( 2 years ago )
0

answered 2 years ago

Here is an iterator over 4-tuples (a, b, c, d) such that a + b + c + d = k and a < b < c < d. You can also ensure that d <= nmax, which seems a requirement of your problem.

def get_series(k, nmax=None):
    """
    Iterator over all series (a, b, c, d) such that:
    - a + b + c + d = k
    - a < b < c < d

    When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
    """
    if nmax is None:
        for a in range(1, (k - 6) // 4 + 1):
            for b in range(a + 1, (k - a - 3) // 3 + 1):
                for c in range(b + 1, (k - a - b - 1) // 2 + 1):
                    yield (a, b, c, k - a - b - c)
    else:
        for a in range(1, min(nmax, (k - 6) // 4) + 1):
            for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
                for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
                    d = k - a - b - c
                    if d <= nmax:
                        yield (a, b, c, d)

you can check that

This is way faster than your brute force attempt

sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True

This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...

sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777

Another useful idea is to store the tuples in a dictionary with key a^2 + b^2 + c^2 + d^2 and value the list of corresponding tuples (so with same sum and same sum of squares).

Preview: (hide)
link

Comments

This is reinventing of Partitions() functionality. See my answer below.

Max Alekseyev gravatar imageMax Alekseyev ( 2 years ago )

Right, thanks.

David Coudert gravatar imageDavid Coudert ( 2 years ago )

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2 years ago

Seen: 499 times

Last updated: Feb 11 '23