There are various ways to deal with this question.
- store just the dna code in a file and read from the file
- use string concatenation
- use multi-line strings
Multi-line strings
Use triple quotes (triple single-quotes or triple double-quotes)
for multi-line strings that can include newlines.
In your case I think you want to remove newlines so use the replace
method.
seq = Word("""
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG
""".replace('\n', '')
)
Splitting a string into several lines
Use auto concatenation of strings:
seq = Word(
"TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA"
"TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA"
"AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG"
"AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG"
)
Read string from file
It could work better to store just the DNA sequence in the file.
Say the file is dna.txt
and contains:
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG
Then you can do one of two things.
Read the whole file:
with open('dna.txt', 'r') as f:
s = f.read()
seq = Word(s.replace('\n', ''))
The advantage here is that the only thing we do
while the file is open for reading is read it,
then the file is closed again and we use the
obtained multi-line string-with-newlines for our
purposes: remove newlines and make a word.
Read line by line:
with open('dna.txt', 'r') as f:
seq = Word(''.join(f))
Here, ''.join(f) takes f
as an iterator of lines
and joins them with an empty string as separator,
thus reconstructing the dna string without linebreaks.