1 | initial version |
I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:
import csv
data =list( csv.reader(open(DATA+'datasample.csv','rU')) )
You'll likely need to clean the resulting list some and then convert all of the entries to float
. Then, if you pull the data into separate lists for each column (I'll call these d1
, d2
, etc.) , you can do some simple statistics using R commands.
For the 5 number summary, use r.summary(d1)
. To check correlations, use r.cor(d1,d2)
. You can do a t-test with r.t_test(d1,d2)
.
You can do a histogram using matplotlib, which I think works better in Sage than the R plots.
import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')
You can do some linear regression using the find_fit
command.
#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans[0].rhs()
b1=ans[1].rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')
You can also do a boxplot.
import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')
2 | added detail on extracting lists from the data list |
I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:
import csv
data =list( csv.reader(open(DATA+'datasample.csv','rU')) )
You'll likely need to clean the resulting list some and then convert all of the entries to float
. Then, if you pull the data into separate lists for each column (I'll call these d1
, d2
, etc.) , you can do some simple statistics using R commands.
To pull the data into separate lists, you can use python list comprehensions like this:
d0=[x[0] for x in data]
d1=[x[1] for x in data]
d2=[x[2] for x in data]
d3=[x[3] for x in data]
For the 5 number summary, use r.summary(d1)
. To check correlations, use r.cor(d1,d2)
. You can do a t-test with r.t_test(d1,d2)
.
You can do a histogram using matplotlib, which I think works better in Sage than the R plots.
import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')
You can do some linear regression using the find_fit
command.
#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans[0].rhs()
b1=ans[1].rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')
You can also do a boxplot.
import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')