Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:

import csv
data =list( csv.reader(open(DATA+'datasample.csv','rU')) )

You'll likely need to clean the resulting list some and then convert all of the entries to float. Then, if you pull the data into separate lists for each column (I'll call these d1, d2, etc.) , you can do some simple statistics using R commands.

For the 5 number summary, use r.summary(d1). To check correlations, use r.cor(d1,d2). You can do a t-test with r.t_test(d1,d2).

You can do a histogram using matplotlib, which I think works better in Sage than the R plots.

import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')

You can do some linear regression using the find_fit command.

#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans[0].rhs()
b1=ans[1].rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')

You can also do a boxplot.

import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')
click to hide/show revision 2
added detail on extracting lists from the data list

I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:

import csv
data =list( csv.reader(open(DATA+'datasample.csv','rU')) )

You'll likely need to clean the resulting list some and then convert all of the entries to float. Then, if you pull the data into separate lists for each column (I'll call these d1, d2, etc.) , you can do some simple statistics using R commands.

To pull the data into separate lists, you can use python list comprehensions like this:

d0=[x[0] for x in data]
d1=[x[1] for x in data]
d2=[x[2] for x in data]
d3=[x[3] for x in data]

For the 5 number summary, use r.summary(d1). To check correlations, use r.cor(d1,d2). You can do a t-test with r.t_test(d1,d2).

You can do a histogram using matplotlib, which I think works better in Sage than the R plots.

import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')

You can do some linear regression using the find_fit command.

#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans[0].rhs()
b1=ans[1].rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')

You can also do a boxplot.

import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')