# why is symbolic comparison so slow?

I tried to write a little differentiatior:

from operator import add, mul, pow
def mydiff(s,x=x):
if x not in SR(s).variables():
return 0
elif s == x:
return 1
elif s == log(x):
return 1/x
elif s == sin(x):
return cos(x)
elif s == cos(x):
return -sin(x)
elif s == tan(x):
return 1+tan(x)^2
elif s == arcsin(x):
return 1/sqrt(1-x^2)
elif s == arctan(x):
return 1/(1+x^2)
op = s.operator()
ops = s.operands()
if op == pow and ops[0]==x and ops[1]._is_numeric():
return ops[1]*ops[0]^(ops[1]-1)
return mydiff(ops[0]) + mydiff(reduce(op,ops[1:]))
elif op == mul:
f = ops[0]; g = reduce(mul,ops[1:])
return g*mydiff(f) + f*mydiff(g)
elif len(ops)==1:
return mydiff(ops[0],x)*mydiff(op(x),x).subs(x=ops[0]))


one my one year old macbook i need to wait more than 17 second to get the following answer

%time mydiff(sum(k*x^k,k,0,10))
CPU times: user 12.39 s, sys: 0.78 s, total: 13.17 s
Wall time: 17.49 s
100*x^9 + 81*x^8 + 64*x^7 + 49*x^6 + 36*x^5 + 25*x^4 + 16*x^3 + 9*x^2 + 4*x + 1


why does it _that_ long?

edit retag close merge delete

Sort by » oldest newest most voted

The previous answers (with the exception of Nils Bruin's) are incorrect in their assessment of the problem -- the slowdown is indeed coming from trying to decide whether "x == tan(x)" is True or False. In particular, the slowdown comes a lot from our overhead in talking to Maxima. Sage performs the following things when trying to test whether an equation is zero or not.

1. Check to see if both sides are constants. If they are, compare the constants.

2. Ask GiNaC / Pynac to see if the expressions are structurally equal (i.e., they have are the same expression trees.) If they are, then we can immediately say things are equal.

3. Check to make sure that there are no assumptions on any of the variables. If there are, then we have to immediately send things to Maxima.

4. If there are no assumptions, then we can try evaluating at a number of points to try to verify that they are provably different using interval arithmetic. This is done in the test_relation method of Expression objects.

5. If that doesn't work, we send it to Maxima. Since the Maxima output gives False if they are not equal as well as if it can't prove they are equal, we can only rely on a True answer from Maxima. Thus, a number of simplifications are done to try to "prove" equality. This is all done in sage.symbolic.relation.test_relation_maxima

6. If all of the above things don't work, then we have to return False.

There are a number of areas for improvement:

1. Most importantly there are quite a few cases in which the test_relation method fails. For example, there some code to check to see if using CIF for the values isn't working and if so switch over to RIF. However, that code is broken. Fixing that would allow something like

sage: eq = x == tan(x)
sage: eq.test_relation()
False


instead of giving NotImplemented like it does now.

2. We can add a syntactically_equal method which just exposes, GiNaC's relational_to_bool test. If we just do that and modify mydiff appropriately, then we get

sage: k = var('k')
sage: s = sum(k*x^k,k,0,10)
sage: %time mydiff(s, x)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
100*x^9 + 81*x^8 + 64*x^7 + 49*x^6 + 36*x^5 + 25*x^4 + 16*x^3 + 9*x^2 + 4*x + 1


I will make tickets for these and add the links to them here.

more

thank you!

( 2010-08-27 16:05:17 +0200 )edit

the problem is neither Python, nor recursion, but Sage expressions. If I replace those with lists and strings the computation time drops to 0:

def mydiff(s,x='x'):
if not x in str(s):
return 0
if s == x:
return 1
elif s == ['log',x]:
return ['pow',x,-1]
elif s == ['sin',x]:
return ['cos',x]
elif s == ['cos',x]:
return ['mul',-1,['sin',x]]
elif s == ['tan',x]:
elif s == ['arcsin',x]:
elif s == ['arctan',x]:
op = s[0]
ops = s[1:]
if op == 'pow' and ops[0]==x and ops[1] in Rationals():
return ['mul',ops[1],['pow',ops[0],(ops[1]-1)]]
return ['sum',map(mydiff,ops)]
elif op == 'mul':
if len(ops) == 2:
if ops[0] in Rationals():
return ['mul',ops[0],mydiff(ops[1])]
else:
f = ops[0]; g = ops[1]


Now:

sage: var('k')
sage: s = ['add']+[['mul',k,['pow','x',k]] for k in [1..10]]
sage: %time mydiff(s)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
['sum', [['mul', 1, ['mul', 1, ['pow', 'x', 0]]], ['mul', 2, ['mul', 2, ['pow', 'x', 1]]], ['mul', 3, ['mul', 3, ['pow', 'x', 2]]], ['mul', 4, ['mul', 4, ['pow', 'x', 3]]], ['mul', 5, ['mul', 5, ['pow', 'x', 4]]], ['mul', 6, ['mul', 6, ['pow', 'x', 5]]], ['mul', 7, ['mul', 7, ['pow', 'x', 6]]], ['mul', 8, ['mul', 8, ['pow', 'x', 7]]], ['mul', 9, ['mul', 9, ['pow', 'x', 8]]], ['mul', 10, ['mul', 10, ['pow', 'x', 9]]]]]

more

This is more like a comment to Mike's answer, but it's rather long and I don't see how to add code to comments, so I'll put it here.

A workaround to check if an expression is trivially equal to another is to use match().

sage: tan(x).match(tan(x)) is not None
True
sage: tan(x).match(log(x)) is not None
False


This is very fast:

sage: sage: %timeit bool(sin(x) == sin(x))
625 loops, best of 3: 19.8 µs per loop
sage: sage: %timeit bool(sin(x) == log(x))
5 loops, best of 3: 639 ms per loop
sage: sage: %timeit bool(sin(x).match(log(x)) is not None)
625 loops, best of 3: 8.3 µs per loop
sage: sage: %timeit bool(sin(x).match(sin(x)) is not None)
625 loops, best of 3: 10.9 µs per loop


Here is the mydiff() function using this hack (and removing the reduce() call for sums):

from operator import add, mul, pow
def mydiff2(s,x=x):
if not s.has(x):
return 0
op = s.operator()
ops = s.operands()
if s.match(x) is not None:
return 1
elif op is log and ops[0].match(x) is not None:
return 1/x
elif op is sin and ops[0].match(x) is not None:
return cos(x)
elif op is cos and ops[0].match(x) is not None:
return -sin(x)
elif op is tan and ops[0].match(x) is not None:
return 1+tan(x)**2
elif op is arcsin and ops[0].match(x) is not None:
return 1/sqrt(1-x**2)
elif op is arctan and ops[0].match(x) is not None:
return 1/(1+x**2)
elif op is pow and (ops[0].match(x) is not None) and ops[1]._is_numeric():
return ops[1]*ops[0]**(ops[1]-1)
return sum(map(mydiff2, ops))
elif op is mul:
f = ops[0]; g = reduce(mul,ops[1:])
return g*mydiff2(f) + f*mydiff2(g)


Now I get:

sage: var('k')
k
sage: inp = sum(k*x^k,k,0,10)
sage: inp
10*x^10 + 9*x^9 + 8*x^8 + 7*x^7 + 6*x^6 + 5*x^5 + 4*x^4 + 3*x^3 + 2*x^2 + x
sage: %time res2 = mydiff2(inp)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
sage: res2
100*x^9 + 81*x^8 + 64*x^7 + 49*x^6 + 36*x^5 + 25*x^4 + 16*x^3 + 9*x^2 + 4*x + 1


Note that speeding up symbolic comparison is #6799 on trac.

more

One obvious thing is that the wall time is considerably longer than the cpu-time. That can mean your process didn't get 100% of the CPU time or that Sage called out to a different process (maxima?) to get some results. Indeed, if you run:

import cProfile
S=sum(k*x^k,k,0,10)
cProfile.run("mydiff(S)")


you'll find 4995 calls to

maxima.py:722(_eval_line)


indicating that there were 4995 calls to another process. There are also various calls to various "simplify" routines, so either in construction or in comparison of symbolic expressions, Sage did feel the need to simplify things.

In short: If you want to do pattern matching on symbolic expressions, do not use "==". It is apparently already reserved for a fairly expensive equality test.

I am surprised that this program still triggers calls out to maxima. I don't think it's "reduce", since that only receives 18 calls and takes negligible time in the profile. All the other operations, including "==" should really map to straight pynac operations.

more

technically, apart from mhamptons comment, there are two problems: (I've just briefly looked at the code and maybe I didn't understand it)

1. your code is recursive and that is not a good idea for python.
2. you rip apart the expression and before each recursion and then you partially rebuild it with reduce. this creates a bunch of calculations for a new expression which you don't need if your differentiator would take a list of operands and a operation directly.
more

Because pure python is slow. No offense but why would you want to do this? Symbolic calculations are usually done with pynac (C++ wrapped into Sage) or Maxima (40 year old optimized Lisp wrapped into Sage). If you want something about symbolic differentiation improved you should contribute to those projects.

What do you need that isn't already there? For your example on my computer:

var('x,k')
time diff(sum(k*x^k,k,0,10),x)


gives

100*x^9 + 81*x^8 + 64*x^7 + 49*x^6 + 36*x^5 + 25*x^4 + 16*x^3 + 9*x^2 + 4*x + 1
Time: CPU 0.02 s, Wall: 0.04 s

more

@mhampton i don't wont to write the next best differentiator in the world. I'm just trying to give an example on how to program with symbolic expressions in sage.

@harald sorry, i should change the line

return mydiff(ops[0]) + mydiff(reduce(op,ops[1:]))


into

return sum(map(mydiff,ops))


then no ripping apart and reconstruction occurs. (I was trying to mimic pattern matching.)

Still, this doesn't help much. Does anybody know how Sage tests equality of symbolic expressions? Does it try to simplify those each time before it compares them?

more