You may be able to use Cython to speed up such things. In order to do so most efficiently, you'd want to cdef certain data types (for instance, the outputs of dy_fc?) and the function you are using.
I'm not at all a Cython expert, though; I know it really depends on the specifics of your use case, and of course fast_callable is already pretty nice. Here are random links which could conceivably help (or not):