Ask Your Question

Revision history [back]

More efficient methods for finding subsets of Subsets()

Calling Subsets([1..100000], 4) takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. %%prun suggests the majority of that time is used by misc.py:556(_stable_uniq)

(Note: trying this with $> \sim 10^5$ elements causes an overflow.)

But getting a subset of that list... hoo boy.

NList = [i for i in Subsets([1..100], 4) if sum(i) == 100] takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by set.py:465(__init__)

Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)

So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000] range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).

Any suggestions welcome!

More efficient methods for finding subsets of Subsets()

Calling SS = Subsets([1..100000], 4) takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. %%prun suggests the majority of that time is used by misc.py:556(_stable_uniq)

(Note: trying this with $> \sim 10^5$ elements causes an overflow.)

But getting a subset of that list... hoo boy.

NList = [i for i in Subsets([1..100], 4) if sum(i) == 100] takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by set.py:465(__init__)

Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)

So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000] range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).

Any suggestions welcome!