# Turning a txt. file into a list in sagemath/python

Hi,

This seems like a very basic question but I have not been able to find a satisfying answer. I'm sorry if it's been answered before.

I have a file in my computer called say text.txt that is made up of a bunch(a lot) of numbers one on each row like so:

2.14
3.15
7.8
etc.

Questions:

1. How can I open and read this file in sage? I am using the online notebook(https://cloud.sagemath.com/) I've been trying to use basic python commands like open("file path") and also uploading the text file as a document in the notebook but I can't seem to get it to work. I should say that my programming knowledge is limited.

2. Once I have opened this file so I can read and write to it, what would be the best code to turn it into a list with all the number in it(i.e. L = [2.14, 3.15, 7.8,...])?

Thanks a lot!

edit retag close merge delete

Sort by » oldest newest most voted

First, you can put all your lines into a Python list:

sage: with open('/path/to/your/file.txt', 'r') as f:


So, L is a list ot lines of the file:

sage: L
['2.14\n', '3.15\n', '7.8 \n']


As you can see, the entries are strings, with maybe spaces and newlines at the end. You can clean such a string with the strip() method:

sage: [l.strip() for l in L]
['2.14', '3.15', '7.8']


But you want Sage floating point numbers, not strings representing them, so you can transform the string into elements of RDF, the real double field:

sage: [RDF(l.strip()) for l in L]
[2.14, 3.15, 7.8]


If you want to summarize this, you can directly do:

sage: with open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f.readlines()]

sage: L
[2.14, 3.15, 7.8]


sage: with open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f]

more

In general the open.readlines() pattern suggested in another answer works quite well and is easier to work when you're developing the file processing that you have to do, but it does lead to the entire textual file content being allocated in main memory. If you want to process the lines as they are read from the file, you can use that the file itself already knows how to "iterate" over its lines:

sage: L = [ RDF(l.strip()) for l in open('/path/to/your/file.txt', 'r')]


While this works, it's better to use a with clause to ensure the file is closed:

sage: with  open('/path/to/your/file.txt', 'r') as f:
....:     L = [RDF(l.strip()) for l in f]

more

I am never sure, does this construction closes the file cleanly if something goes wrong during the processing (e.g. if some element can not be turned into a real number) ?

( 2015-08-07 14:19:01 -0500 )edit

In practice, with CPython, yes, because the garbage collection is quite eager and files are closed upon deallocation of their object. In theory, no, because Python makes no guarantees about its garbage collection. So wrapping it in a with clause is definitely recommended.

( 2015-08-08 02:41:25 -0500 )edit

I see, i updated my answer accordingly, thanks for the precision.

( 2015-08-08 11:39:42 -0500 )edit

I typically use the following:

import csv


The data is read into a list as strings. Then, you can convert to another data type. For example, you can convert to integers as follows:

data=map(lambda x: int(x),data)

more

Same concerns as in the previous story: the file will remain open at least for the lifetime of the csv.reader object. You might prefer:

with open('myfile.csv','rU') as f:
data=[int(x) for x in R]


which has two theoretical advantages (on CPython it doesn't really make a difference currently) over your snippet:

• it's guaranteed to close the file upon exit of the "with" clause (open files are a scarce resource, so it's good to not let open files linger)

• the file-content doesn't end up as strings in memory all at once (good for bigger files, and faster because memory allocation is limited)

This code does leave a defunct csv.reader object linger, but that shouldn't really hurt.

( 2015-08-09 14:14:49 -0500 )edit