1 | initial version |
There are various ways to deal with this question.
Use triple quotes (triple single-quotes or triple double-quotes) for multi-line strings that can include newlines.
In your case I think you want to remove newlines so use the replace
method.
seq = Word("""
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG
""".replace('\n', '')
)
Use auto concatenation of strings:
seq = Word(
"TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA"
"TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA"
"AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG"
"AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG"
)
It could work better to store just the DNA sequence in the file.
Say the file is dna.txt
and contains:
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG
Then you can do one of two things.
Read the whole file:
with open('dna.txt', 'r') as f:
s = f.read()
seq = Word(s.replace('\n', ''))
The advantage here is that the only thing we do while the file is open for reading is read it, then the file is closed again and we use the obtained multi-line string-with-newlines for our purposes: remove newlines and make a word.
Read line by line:
with open('dna.txt', 'r') as f:
seq = Word(''.join(f))
Here, ''.join(f) takes f
as an iterator of lines
and joins them with an empty string as separator,
thus reconstructing the dna string without linebreaks.