Ask Your Question

Revision history [back]

There are various ways to deal with this question.

  • store just the dna code in a file and read from the file
  • use string concatenation
  • use multi-line strings

Multi-line strings

Use triple quotes (triple single-quotes or triple double-quotes) for multi-line strings that can include newlines.

In your case I think you want to remove newlines so use the replace method.

seq = Word("""
    TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
    TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
    AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
    AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG
    """.replace('\n', '')
)

Splitting a string into several lines

Use auto concatenation of strings:

seq = Word(
    "TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA"
    "TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA"
    "AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG"
    "AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG"
)

Read string from file

It could work better to store just the DNA sequence in the file.

Say the file is dna.txt and contains:

TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAA
AGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGG
AGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCG

Then you can do one of two things.

Read the whole file:

with open('dna.txt', 'r') as f:
    s = f.read()

seq = Word(s.replace('\n', ''))

The advantage here is that the only thing we do while the file is open for reading is read it, then the file is closed again and we use the obtained multi-line string-with-newlines for our purposes: remove newlines and make a word.

Read line by line:

with open('dna.txt', 'r') as f:
    seq = Word(''.join(f))

Here, ''.join(f) takes f as an iterator of lines and joins them with an empty string as separator, thus reconstructing the dna string without linebreaks.