# Why is Words('ab',50).random_element().count('aa') always incorrect ???

On the other hand, Words('ab',50).random_element().count('a')
or Words('ab',50).random_element().count('b') are always correct, and so are

edit retag close merge delete

This query was motivated by implementing a counter for occurrences of words on a given or random symbolic sequence e.g. counting the occurrences of ACTs on a fixed or random DNA sequence. It is now implemented as

def fcount(factor,word):
return Word(word).parent()(Word(factor)).nb_factor_occurrences_in(Word(word))


so that

fcount('ACT','ACTTCATTTCCCTTCTTTACTTTCT') ## =2


def syse(sym,pos):
return [w.string_rep() for w in FiniteWords(sym).iterate_by_length(pos)]


can generate all counts for that class, for example

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',1)])


retuns

y   7
u   3


and

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',2)])


returns

yy  5
yu  1
uy  1
uu  2


Sort by » oldest newest most voted

## Correct or incorrect?

Surprisingly, the answer is actually correct. As often, it is a matter of definition!

Let us walk through this.

## What does count count?

Define a random word and give it a name:

sage: w = Words('ab', 20).random_element()
sage: w
baabaaaaaaabbabbbbba


Count for 'aa' in w:

sage: w.count('aa')
0


Since we may be surprised, let us read the documentation for the count method:

sage: w.count?


or its source code:

sage: w.count??


Oh, so count counts the occurrences of letters.

Here, w is a word on the alphabet {'a', 'b'}, and 'aa' is not a letter in that alphabet.

So the count of how many times 'aa' appears in w as a letter must be zero.

Compare:

sage: w = Words(['a', 'b', 'aa'], 10).random_element()
sage: w
word: aa,aa,a,aa,a,a,a,b,a,aa
sage: w.count('aa')
4


## Factors and subwords

So how do we count factors? or subwords?

Get hold of the set of words.

sage: W = w.parent()
sage: W
Finite words over {'a', 'b'}


Define the factor we are looking for:

sage: f = W('aa')


Count its occurrences as a factor or a subword in w.

sage: f.nb_factor_occurrences_in(w)
7
sage: f.nb_subword_occurrences_in(w)
55


# Improve the documentation?

The question raises a valid point! The documentation for count should at least point to the nb_factor_occurrences, maybe with an example such as the one here.

This is now tracked at

more

What prompted my query is the fact that Word('abbaabababbbb').count('ab') works as I expected, including when the word is an output of the form Words('ab', 20).random_element(). I only wanted to avoid the copy/paste of long random words into the former.

That example was maybe meant to be part of the question, which seems to suffer an editing error: it ends mid-sentence with "and so are". To edit the question, click the "Edit" button below it.