Ask Your Question
1

Turning a txt. file into a list in sagemath/python

asked 2015-08-06 15:13:12 -0600

uipap gravatar image

Hi,

This seems like a very basic question but I have not been able to find a satisfying answer. I'm sorry if it's been answered before.

I have a file in my computer called say text.txt that is made up of a bunch(a lot) of numbers one on each row like so:

2.14
3.15
7.8
etc.

Questions:

  1. How can I open and read this file in sage? I am using the online notebook(https://cloud.sagemath.com/) I've been trying to use basic python commands like open("file path") and also uploading the text file as a document in the notebook but I can't seem to get it to work. I should say that my programming knowledge is limited.

  2. Once I have opened this file so I can read and write to it, what would be the best code to turn it into a list with all the number in it(i.e. L = [2.14, 3.15, 7.8,...])?

Thanks a lot!

edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted
2

answered 2015-08-07 03:53:53 -0600

tmonteil gravatar image

updated 2015-08-08 11:42:15 -0600

First, you can put all your lines into a Python list:

sage: with open('/path/to/your/file.txt', 'r') as f:
....:     L = f.readlines()

So, L is a list ot lines of the file:

sage: L
['2.14\n', '3.15\n', '7.8 \n']

As you can see, the entries are strings, with maybe spaces and newlines at the end. You can clean such a string with the strip() method:

sage: [l.strip() for l in L]
['2.14', '3.15', '7.8']

But you want Sage floating point numbers, not strings representing them, so you can transform the string into elements of RDF, the real double field:

sage: [RDF(l.strip()) for l in L]
[2.14, 3.15, 7.8]

If you want to summarize this, you can directly do:

sage: with open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f.readlines()]

sage: L
[2.14, 3.15, 7.8]

UPDATE: we can get the best of this answer and the one of @nbruin as follows:

sage: with open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f]
edit flag offensive delete link more
1

answered 2015-08-07 10:33:20 -0600

nbruin gravatar image

updated 2015-08-08 02:43:22 -0600

In general the open.readlines() pattern suggested in another answer works quite well and is easier to work when you're developing the file processing that you have to do, but it does lead to the entire textual file content being allocated in main memory. If you want to process the lines as they are read from the file, you can use that the file itself already knows how to "iterate" over its lines:

sage: L = [ RDF(l.strip()) for l in open('/path/to/your/file.txt', 'r')]

While this works, it's better to use a with clause to ensure the file is closed:

sage: with  open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f]
edit flag offensive delete link more

Comments

I am never sure, does this construction closes the file cleanly if something goes wrong during the processing (e.g. if some element can not be turned into a real number) ?

tmonteil gravatar imagetmonteil ( 2015-08-07 14:19:01 -0600 )edit

In practice, with CPython, yes, because the garbage collection is quite eager and files are closed upon deallocation of their object. In theory, no, because Python makes no guarantees about its garbage collection. So wrapping it in a with clause is definitely recommended.

nbruin gravatar imagenbruin ( 2015-08-08 02:41:25 -0600 )edit

I see, i updated my answer accordingly, thanks for the precision.

tmonteil gravatar imagetmonteil ( 2015-08-08 11:39:42 -0600 )edit
0

answered 2015-08-08 15:18:40 -0600

calc314 gravatar image

I typically use the following:

import csv
data=list( csv.reader(open('myfile.csv','rU')) )

The data is read into a list as strings. Then, you can convert to another data type. For example, you can convert to integers as follows:

data=map(lambda x: int(x),data)
edit flag offensive delete link more

Comments

Same concerns as in the previous story: the file will remain open at least for the lifetime of the csv.reader object. You might prefer:

with open('myfile.csv','rU') as f:
    R = csv.reader(f)
    data=[int(x) for x in R]

which has two theoretical advantages (on CPython it doesn't really make a difference currently) over your snippet:

  • it's guaranteed to close the file upon exit of the "with" clause (open files are a scarce resource, so it's good to not let open files linger)

  • the file-content doesn't end up as strings in memory all at once (good for bigger files, and faster because memory allocation is limited)

This code does leave a defunct csv.reader object linger, but that shouldn't really hurt.

nbruin gravatar imagenbruin ( 2015-08-09 14:14:49 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2015-08-06 15:13:12 -0600

Seen: 1,519 times

Last updated: Aug 08 '15