Why is Words('ab',50).random_element().count('aa') always incorrect ???

words

magviana
31 ●2 ●2 ●4

On the other hand, Words('ab',50).random_element().count('a')
or Words('ab',50).random_element().count('b') are always correct, and so are

Comments

This query was motivated by implementing a counter for occurrences of words on a given or random symbolic sequence e.g. counting the occurrences of ACTs on a fixed or random DNA sequence. It is now implemented as

def fcount(factor,word):  
    return Word(word).parent()(Word(factor)).nb_factor_occurrences_in(Word(word))

so that

fcount('ACT','ACTTCATTTCCCTTCTTTACTTTCT') ## =2

which with the added function

def syse(sym,pos):
    return [w.string_rep() for w in FiniteWords(sym).iterate_by_length(pos)]

can generate all counts for that class, for example

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',1)])

retuns

y   7
u   3

and

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',2)])

returns

yy  5
yu  1
uy  1
uu  2

magviana ( 4 years ago )

add a comment

answered 4 years ago

slelievre
17779 ●22 ●162 ●352 http://carva.org/samue...

updated 4 years ago

Correct or incorrect?

Surprisingly, the answer is actually correct. As often, it is a matter of definition!

Let us walk through this.

What does `count` count?

Define a random word and give it a name:

sage: w = Words('ab', 20).random_element()
sage: w
baabaaaaaaabbabbbbba

Count for 'aa' in w:

sage: w.count('aa')
0

Since we may be surprised, let us read the documentation for the count method:

sage: w.count?

or its source code:

sage: w.count??

Oh, so count counts the occurrences of letters.

Here, w is a word on the alphabet {'a', 'b'}, and 'aa' is not a letter in that alphabet.

So the count of how many times 'aa' appears in w as a letter must be zero.

Compare:

sage: w = Words(['a', 'b', 'aa'], 10).random_element()
sage: w
word: aa,aa,a,aa,a,a,a,b,a,aa
sage: w.count('aa')
4

Factors and subwords

So how do we count factors? or subwords?

Get hold of the set of words.

sage: W = w.parent()
sage: W
Finite words over {'a', 'b'}

Define the factor we are looking for:

sage: f = W('aa')

Count its occurrences as a factor or a subword in w.

sage: f.nb_factor_occurrences_in(w)
7
sage: f.nb_subword_occurrences_in(w)
55

Improve the documentation?

The question raises a valid point! The documentation for count should at least point to the nb_factor_occurrences, maybe with an example such as the one here.

This is now tracked at

Sage Trac ticket 30143

link

Comments

What prompted my query is the fact that Word('abbaabababbbb').count('ab') works as I expected, including when the word is an output of the form Words('ab', 20).random_element(). I only wanted to avoid the copy/paste of long random words into the former.

magviana ( 4 years ago )

That inconsistency is more of a bug, thanks for pointing it out. This is now tracked at

Sage Trac ticket 30187: make count more consistent across alphabets

slelievre ( 4 years ago )

That example was maybe meant to be part of the question, which seems to suffer an editing error: it ends mid-sentence with "and so are". To edit the question, click the "Edit" button below it.

slelievre ( 4 years ago )

add a comment

Why is Words('ab',50).random_element().count('aa') always incorrect ???

Comments

1 Answer

Correct or incorrect?

What does `count` count?

Factors and subwords

Improve the documentation?

Comments

Your Answer

Question Tools

Stats

Related questions

Why is Words('ab',50).random_element().count('aa') always incorrect ??? savecancel

Comments

1 Answer

Correct or incorrect?

What does count count?

Factors and subwords

Improve the documentation?

Comments

Your Answer

Question Tools

Stats

Related questions

Why is Words('ab',50).random_element().count('aa') always incorrect ???

What does `count` count?