Ask Your Question
1

Numerical and graphical summaries of data

asked 2013-04-16 16:43:21 +0100

anonymous user

Anonymous

updated 2014-10-28 21:14:48 +0100

kcrisman gravatar image

I have been given the following data:

  pcb138 pcb180 pcb52 pcb118      pcb
1    1.46  0.738 0.532  0.720  19.9959
2    0.64  0.664 0.030  0.236   6.0996
3    3.29  1.150 0.134  1.540  24.9655
4    3.94  1.330 0.466  1.940  37.4436
5    3.18  2.140 0.243  1.470  30.1830
6    2.43  1.300 0.137  1.310  20.8036
7    3.94  3.490 0.208  0.876  41.3818
8    3.38  1.040 0.477  2.460  29.4780
9    2.21  0.966 0.457  1.140  24.2387
10   2.49  1.590 0.298  1.180  26.3198
11   0.86  0.395 0.020  0.406   8.5910
12   3.38  1.850 0.539  1.500  36.4229
13   7.39  4.420 0.707  3.550  66.4108
14   2.74  0.595 0.893  1.980  30.5757
15   2.58  1.780 0.112  1.520  25.4771
16   7.28  3.490 1.440  4.000  68.5567
17   2.29  2.100 0.124  0.981  23.1381
18   5.35  5.370 0.154  0.737  43.0451
19   4.62  2.690 0.319  1.490  39.5300
20   3.54  1.140 0.536  1.890  36.5013
21   1.98  1.040 0.718  0.889  26.5255
22   2.01  1.040 0.173  1.500  22.1370
23   2.22  0.897 0.228  1.070  19.1992
24   3.50  2.330 0.456  1.520  32.9518
25   0.86  0.474 0.152  0.393   9.0893
26   4.92  3.650 0.181  1.790  42.0037
27   2.76  0.868 1.780  2.140  48.7727
28   5.18  3.610 0.843  2.390  55.8940
29   2.60  1.240 0.482  1.600  31.8021
30   4.95  2.740 1.290  2.350  60.1485
31  10.80  8.820 0.067  3.550  97.2793
32   2.02  1.390 0.311  1.310  18.3945
33   3.24  2.600 0.117  1.740  27.5003
34   8.22  7.070 0.531  2.560  79.0347
35   9.50  9.470 0.752  2.420  97.8119
36   4.88  2.690 0.304  2.600  44.8870
37   5.75  3.100 0.595  1.850  58.2125
38   5.48  5.460 0.352  1.540  57.4186
39   8.08  3.370 0.065  3.580  57.4938
40   3.29  2.370 0.340  1.350  33.5817
41   3.73  1.020 5.860  3.890 115.7361
42   1.36  0.624 0.269  0.508  14.8479
43   9.92  2.810 1.260  4.770  91.6305
44   8.65  6.210 0.428  3.680  92.1625
45   4.56  1 ...
(more)
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2013-04-16 22:49:37 +0100

calc314 gravatar image

updated 2013-04-20 22:29:23 +0100

I'm going to assume that you are running in the Notebook environment. In that case, you can load a csv file easily into the notebook. Then, the data can be read as such:

import csv
data =list( csv.reader(open(DATA+'datasample.csv','rU')) )

You'll likely need to clean the resulting list some and then convert all of the entries to float. Then, if you pull the data into separate lists for each column (I'll call these d1, d2, etc.) , you can do some simple statistics using R commands.

To pull the data into separate lists, you can use python list comprehensions like this:

d0=[x[0] for x in data]
d1=[x[1] for x in data]
d2=[x[2] for x in data]
d3=[x[3] for x in data]

For the 5 number summary, use r.summary(d1). To check correlations, use r.cor(d1,d2). You can do a t-test with r.t_test(d1,d2).

You can do a histogram using matplotlib, which I think works better in Sage than the R plots.

import matplotlib.pyplot as plt
plt.figure()
hp=plt.hist(d1,bins=10,range=[min(d1),max(d1)])
plt.title('pcb138')
plt.xlabel("x-axis")
plt.ylabel("Frequency")
plt.savefig('pcb138.png')

You can do some linear regression using the find_fit command.

#linear model
d12=zip(d1,d2)
var('a b x x1')
f(x) = a*x+b
ans=find_fit(d12,f)
a1=ans[0].rhs()
b1=ans[1].rhs()
points(d12)+plot(a1*x+b1,(x,0,30),color='red')

You can also do a boxplot.

import matplotlib.pyplot as plt
plt.figure()
bp=plt.boxplot(d1)
plt.title("pcb138")
plt.ylabel("y scale")
plt.savefig('box.png')
edit flag offensive delete link more

Comments

Sorry for my beginner question. But, please answer: How do d1, d2, etc? Tried: d1 = data[1:] d2 = data[2:] but don't work.

marciorsoliveira gravatar imagemarciorsoliveira ( 2013-04-20 02:17:10 +0100 )edit

See my edits above.

calc314 gravatar imagecalc314 ( 2013-04-20 22:27:03 +0100 )edit

Sorry, but I cannot help pointing this out: the 2nd block of code, "d[0]=[x[0] for x in data]; d[1]=[...]" can be written simply as "d0,d1,d2,d3 = zip(\*data)", or "zip(*data)[:4]" if "data" has more than 3 fields. I love python :)

Jesustc gravatar imageJesustc ( 2013-04-21 08:17:46 +0100 )edit
1

Cool!

calc314 gravatar imagecalc314 ( 2013-04-21 17:55:09 +0100 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2013-04-16 16:43:21 +0100

Seen: 825 times

Last updated: Oct 28 '14