# Revision history [back]

I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:

import csv


You'll likely need to clean the resulting list some and then convert all of the entries to float. Then, if you pull the data into separate lists for each column (I'll call these d1, d2, etc.) , you can do some simple statistics using R commands.

For the 5 number summary, use r.summary(d1). To check correlations, use r.cor(d1,d2). You can do a t-test with r.t_test(d1,d2).

You can do a histogram using matplotlib, which I think works better in Sage than the R plots.

import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')


You can do some linear regression using the find_fit command.

#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans.rhs()
b1=ans.rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')


You can also do a boxplot.

import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')


I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:

import csv


You'll likely need to clean the resulting list some and then convert all of the entries to float. Then, if you pull the data into separate lists for each column (I'll call these d1, d2, etc.) , you can do some simple statistics using R commands.

To pull the data into separate lists, you can use python list comprehensions like this:

d0=[x for x in data]
d1=[x for x in data]
d2=[x for x in data]
d3=[x for x in data]


For the 5 number summary, use r.summary(d1). To check correlations, use r.cor(d1,d2). You can do a t-test with r.t_test(d1,d2).

You can do a histogram using matplotlib, which I think works better in Sage than the R plots.

import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')


You can do some linear regression using the find_fit command.

#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans.rhs()
b1=ans.rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')


You can also do a boxplot.

import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')