Ask Your Question

Revision history [back]

Turning integers into booleans for rapid comparison of subsets

I am writing a program which, among many tasks, needs to compare subsets of a given finite set $S$ to each other a lot of times, and decide if a given subset is contained in another one.

The moment $|S|$ grows, the individual comparison becomes costly. Together with the fact that I need to compare hundreds of subsets to each other, it becomes undoable. It also takes a lot of memory. However, it has been pointed out to me that since $S$ doesn't change, I can represent each subset as an array of booleans. Then two arrays of booleans can be compared, e.g. if $a\&b=a$ (where & is 'and'), then $S_a\subseteq S_b$. Since one would expect comparison of booleans arrays to be fast (at least in C and/or assembler, it is really fast for the types int/longint), this should be much faster than what I do currently (which is using the 'set' class).

Now, how to implement this? I need a choice of boolean tuple class that can be 'efficient' and can manage an arbitrarily long tuple (in one application $|S|=84$ but if this becomes as efficient as I hope for, I am likely to think of applications where this number is larger). Of course I could just make it A colleague who doesn't use sage has suggested https://pypi.org/project/bitarray/ Is this a reasonable choice? Can I use bitarray in Sage without installing anything? If not, does Sage have a similar thing or one that comes by default? Or is there a way to tell Sage to install bitsize when the program is execute? I intend to turn my programme into a package. If I need to include bitsize in it, is there a 'canonical' way of doing this?

The second thing I need is to have a good way of matching the arrays of booleans to the specific subsets they represent. Since for the comparison I don't need to know what each boolean represents, it is something I can do it when I create the boolean/subset and after I carry out all the comparisons necessary to translate it back into the represented objects. I guess the way to go is to choose an arbitrary ordering of $S$ and then have a function that given $s\in S$ it gives an array of booleans and given an array of booleans gives back an $s\in S$. I may implement this with a dictionary. Does this sound too cumbersome?