Ask Your Question
0

Read a very large file

asked 2022-04-23 15:26:01 +0100

lijr07 gravatar image

I have a very large file and I read it as follows:

with open('/Users/jianrongli/Downloads/SmallRank4ModulesGr416.txt', 'r') as fp:
    L = [sage_eval(line) for line in fp.readlines() if line.strip()]

The file is very large (more than 3 G). Each element in L is a 2d array like [[1,2,3,5],[2,3,4,9]]. Now I want to select those in L whose max number is less than 12.

r1=[]
for j in b2:
    t1=[]
    for i in j:
        #print(t1,list(i))
        t1=t1+list(i)
    if max(t1)<=12:
        r1.append(j)
len(r1)

Since the file is very large, it takes a few hours and has not finished. Is there some way to make it faster? Thank you very much.

edit retag flag offensive close merge delete

Comments

Your condition can be terser : ...if max(flatten(j))<=12 should do the job.

But your main problem is that you have to read the whole b2 in memory before even starting to process it. Can you somehow (sed comes in mind...) restructure your input file in order to have exactly one element of b2 on each line ? If so, your program could simply do something along the lines of :

fp=open('Your/file/name.txt')
L=[]
l=fp.readline()
while l :
    j=eval(l)
    if max(flatten(j))<=12: L.append(j)
    l=fp.readline()
fp.close()

HTH,

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2022-04-23 17:04:47 +0100 )edit

@Emmanuel, thank you very much!

lijr07 gravatar imagelijr07 ( 2022-04-23 18:04:19 +0100 )edit

Does it work for you ?

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2022-04-23 19:49:20 +0100 )edit

@Emmanuel, yes, the speed is increased. Thank you very much.

lijr07 gravatar imagelijr07 ( 2022-04-23 21:49:34 +0100 )edit

OK. I'll transcribe that as an answer, for sake of future users with a similar question. Feel free to accept it.

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2022-04-24 08:21:11 +0100 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2022-04-24 08:21:48 +0100

Emmanuel Charpentier gravatar image

Your condition can be terser : ...if max(flatten(j))<=12 should do the job.

But your main problem is that you have to read the whole b2 in memory before even starting to process it. Can you somehow (sed comes in mind...) restructure your input file in order to have exactly one element of b2 on each line ? If so, your program could simply do something along the lines of :

fp=open('Your/file/name.txt')
L=[]
l=fp.readline()
while l :
    j=eval(l)
    if max(flatten(j))<=12: L.append(j)
    l=fp.readline()
fp.close()

HTH,

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2022-04-23 15:26:01 +0100

Seen: 534 times

Last updated: Apr 24 '22