Ask Your Question
2

How to find the longest consecutive repeated part of a string?

asked 2020-06-08 17:09:54 +0100

mathhobbyist gravatar image

updated 2024-02-21 11:23:36 +0100

FrédéricC gravatar image

I found a programming exercise from https://www.ohjelmointiputka.net/post... . The problem goes as follows:

A composer has made a new song but forgot to denote the repetition parts. Help the composer to find the longest consecutive repetition part. For example, if the song has notes hfhfggccaggccagccafff, it has several parts that is repeated twice and consecutively, like hf, ggcca, gccag and gcca. The task is to output the case, how many notes the longest repetition part contains, and where the repetition part begins. For example above, the output should be 1 5 5. The rest cases are from https://www.ohjelmointiputka.net/tied... .

I tried to solve the problem by Sage by donwloadin the zip file, unzip it and by runnig the following code, that uses Ukkonen's suffix tree I guess:

from sage.combinat.words.suffix_trees import DecoratedSuffixTree
for case in range(1,11):
    with open(str(case)+'.txt','r') as file:
        s = file.read().replace('\n', '')
    start, run2 = max((x[1], x) for x in DecoratedSuffixTree(Word(s)).square_vocabulary())[1]
    print ("{} {} {}".format(case, run2/2, start))

The output was

1 5 5
2 8 387
3 14 3947
4 10 18374
5 19 285340
6 157 312
7 3663 12673
8 115 0
Traceback (most recent call last):
  File "tree.sage.py", line 11, in <module>
    start, run2 = max((x[_sage_const_1 ], x) for x in DecoratedSuffixTree(Word(s)).square_vocabulary())[_sage_const_1 ]
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1627, in __init__
    self.labeling = self._complete_labeling()
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1810, in _complete_labeling
    treat_node(0, 0, 0)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1801, in treat_node
    treat_node(child, edge_label[0]-(j-i), edge_label[1])
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1801, in treat_node
    treat_node(child, edge_label[0]-(j-i), edge_label[1])
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1801, in treat_node
    treat_node(child, edge_label[0]-(j-i), edge_label[1])
  [Previous line repeated 77 more times]
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1805, in treat_node
    walk_chain(current_node, child, l, square_start)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1782, in walk_chain
    walk_chain(parent, child, depth, start+1)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1782, in walk_chain
    walk_chain(parent, child, depth, start+1)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1782, in walk_chain
    walk_chain(parent, child, depth, start+1)
  [Previous line repeated 907 more times]
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1759, in walk_chain
    final_state = self.suffix_walk((u, v), l)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1418, in suffix_walk
    return self._count_and_skip(parent, i, i+l)
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 1372, in _count_and_skip
    trans = self._find_transition(node, self._letters[i])
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/sage/combinat/words/suffix_trees.py", line 759, in _find_transition
    for ((k,p),s) in iteritems(self._transition_function[state]):
  File "/home/jaakko/SageMath/local/lib/python3.7/site-packages/six.py", line 587, in iteritems
    return iter(d.items(**kw))
RecursionError: maximum recursion depth exceeded while calling a Python object

Is there a way to fix this code or use some other approach?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2020-06-12 15:29:31 +0100

dan_fulea gravatar image

The following worked for me, after a little dance around to understand what happens:

from sage.combinat.words.suffix_trees import DecoratedSuffixTree as DST

for case in range(1, 11):
    with open(f'/home/dan/Downloads/mpera/{case}.txt', 'r') as f:
        s = f.read().replace('\n', '')

    sq_vocabulary = DST(Word(s)).square_vocabulary()
    sq_vocabulary.sort(key=lambda x: (x[1], x[0]), reverse=True)
    start, square_string_len = sq_vocabulary[0]
    print(case, square_string_len/2, start)

(The zip file was downloaded, then unpacked under /home/dan/Downloads/mpera.)

Results:

1 5 5
2 8 387
3 14 3947
4 10 18374
5 19 285340
6 157 312
7 3663 12673
8 115 0
9 102971 80
10 355765 19558
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2020-06-08 17:09:54 +0100

Seen: 302 times

Last updated: Jun 12 '20