finding the extremum of density in a sample

asked 2016-06-10 13:26:24 +0100

fagui gravatar image

updated 2017-01-08 12:25:05 +0100

FrédéricC gravatar image

I am facing the following problem: I have (large) sample of unevenly distributed points $(X_i,Y_i)$ in a 2D space. I would like to determine the local extremas of the density of the distribution.

I have tried with the function KernelDensity in sklearn.

Does the function KernelDensity allow to estimate the density of the sample in a point outside the sample ?

if yes, i cannot find the right syntax ?

Here is an example:

import numpy as np
import pandas as pd
mean0=[0,0]
cov0=[[1,0],[0,1]]
mean1=[3,3]
cov1=[[1,0.2],[0.2,1]]
A=pd.DataFrame(np.vstack((np.random.multivariate_normal(mean0, cov0, 5000),np.random.multivariate_normal(mean1, cov1, 5000))))
A.columns=['X','Y']
A.describe()

from sklearn.neighbors import KernelDensity
kde = KernelDensity(bandwidth=0.04, metric='euclidean',
                        kernel='gaussian', algorithm='ball_tree')
kde.fit(A)

If I make this query

kde.score_samples([(0,0)])

i get a negative number, clearly not a density !!

array([-2.88134574])

I don't know if its the right approach. I would like then use that function to use an optimizer to get local extremas. (which library/function would you recommend ?)

is there some easy way to do that in Sagemath ?

thank you very much for your help

edit retag flag offensive close merge delete