# Best practice to save results of a function as a database

I have a function defined inductively that taxes my computer quite a lot. It will help a lot if I can save its results somewhere, preventing it from unnecessary calculations.

To simplify, assume that it's a function $f: \mathbb{N} \to \mathbb{N}$, where $f(0)=1$ and $f(i)$ depends on the value of $f$ at lower $i$'s. Imagine that I want to calculate $f(n)$ where $n \sim 10^{1000}$.

Of course, I can do it by

for i in range(1000):
f(i).save("./f_value_{}".format(i))


But I feel like this can be improved. Any idea?

edit retag close merge delete

Sort by » oldest newest most voted

One option is to use the @cached_function decorator which saves the result of already computed values:

sage: @cached_function
....: def fib(n):
....:     print('computation performed')
....:     if n == 0 or n == 1:
....:         return 1
....:     else:
....:         return fib(n-1) + fib(n-2)
....:
sage: fib(5)
computation performed
computation performed
computation performed
computation performed
computation performed
computation performed
8
sage: fib(5)   # no computation is performed
8


Internally, results are saved in a dictionary where keys are of the form (*args,**kwds):

sage: fib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}


See the documentation of

sage: cached_function?


Then, you can save the fib function locally:

sage: save(fib, 'fib')
sage: ls fib*
fib.sobj


Then, in the same or in another Sage session, you can reload that function and its previous cache is still there:

sage: gib = load('fib.sobj')
sage: gib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}
sage: gib(5)
8


Which means that the default behavior is to include the cache in the pickle (documentation says one should write @cached_function(do_pickle=True) which does not seem necessary). Behind the scene, save is using dumps to create the file fib.sobj which contains binary code looking something like :

sage: dumps(fib)
b'x\x9cm\x8c\xbb\x12@0\x10\x00\xe39&\x7f\xa2\xc9W\xa8\x15*\xddM\x9c\xe0\x06\xc7M\xe8u~\xdb\xab\xd5\xed\x16\xbbG\x88\xde\xf6\xce\xcc\xe4\xd1\xa0\xc5\xc1u;\xa3\x86\x17[xd\xa3\x85a\xe7\x95p\x9c\x9c\x16UgJ)\x80\xd9\x12\x03HPG\xb7v\xd4\xc8\xff\xa9\\\xbe\x92\xb8/\x087-Q^I|JRIj.\xbe\xa8)\xf2'


sage: hib = loads(dumps(fib))
sage: hib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}


If the cache becomes very heavy in memory, then saving and loading the function will take long time. I think I already heard of a decorator that query automatically a file if result is already computed or saves automatically to the file (or database) if it is a new computation, but I can't remember what/where I saw that, whether it was a ticket or just a discussion. I am sure people involved in LMFDB project (which computes stuff on the fly if not already in a database) have an idea about that.

more

One needs to decide on some balance between the cost of storing and the cost of computing. Find out what values are worth storing and will make other calculations easier. This might depend on $f$.

Would storing the values at primes or prime powers make more sense? Values at multiple of $2^{30}$?

In any case, $10^{1000}$ is a lot, there is not much hope to compute $f(n)$ for all values of $n$ up to that bound.

more