Ask Your Question
1

Why is Words('ab',50).random_element().count('aa') always incorrect ???

asked 2020-07-14 18:24:31 +0100

magviana gravatar image

On the other hand, Words('ab',50).random_element().count('a')
or Words('ab',50).random_element().count('b') are always correct, and so are

edit retag flag offensive close merge delete

Comments

This query was motivated by implementing a counter for occurrences of words on a given or random symbolic sequence e.g. counting the occurrences of ACTs on a fixed or random DNA sequence. It is now implemented as

def fcount(factor,word):  
    return Word(word).parent()(Word(factor)).nb_factor_occurrences_in(Word(word))

so that

fcount('ACT','ACTTCATTTCCCTTCTTTACTTTCT') ## =2

which with the added function

def syse(sym,pos):
    return [w.string_rep() for w in FiniteWords(sym).iterate_by_length(pos)]

can generate all counts for that class, for example

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',1)])

retuns

y   7
u   3

and

table([(x,fcount(x,'yyyyyyuuuy')) for x in syse('yu',2)])

returns

yy  5
yu  1
uy  1
uu  2
magviana gravatar imagemagviana ( 2020-07-23 00:29:40 +0100 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-07-14 19:33:24 +0100

slelievre gravatar image

updated 2020-07-14 19:43:28 +0100

Correct or incorrect?

Surprisingly, the answer is actually correct. As often, it is a matter of definition!

Let us walk through this.

What does count count?

Define a random word and give it a name:

sage: w = Words('ab', 20).random_element()
sage: w
baabaaaaaaabbabbbbba

Count for 'aa' in w:

sage: w.count('aa')
0

Since we may be surprised, let us read the documentation for the count method:

sage: w.count?

or its source code:

sage: w.count??

Oh, so count counts the occurrences of letters.

Here, w is a word on the alphabet {'a', 'b'}, and 'aa' is not a letter in that alphabet.

So the count of how many times 'aa' appears in w as a letter must be zero.

Compare:

sage: w = Words(['a', 'b', 'aa'], 10).random_element()
sage: w
word: aa,aa,a,aa,a,a,a,b,a,aa
sage: w.count('aa')
4

Factors and subwords

So how do we count factors? or subwords?

Get hold of the set of words.

sage: W = w.parent()
sage: W
Finite words over {'a', 'b'}

Define the factor we are looking for:

sage: f = W('aa')

Count its occurrences as a factor or a subword in w.

sage: f.nb_factor_occurrences_in(w)
7
sage: f.nb_subword_occurrences_in(w)
55

Improve the documentation?

The question raises a valid point! The documentation for count should at least point to the nb_factor_occurrences, maybe with an example such as the one here.

This is now tracked at

edit flag offensive delete link more

Comments

What prompted my query is the fact that Word('abbaabababbbb').count('ab') works as I expected, including when the word is an output of the form Words('ab', 20).random_element(). I only wanted to avoid the copy/paste of long random words into the former.

magviana gravatar imagemagviana ( 2020-07-20 23:25:46 +0100 )edit

That inconsistency is more of a bug, thanks for pointing it out. This is now tracked at

slelievre gravatar imageslelievre ( 2020-07-21 09:52:24 +0100 )edit

That example was maybe meant to be part of the question, which seems to suffer an editing error: it ends mid-sentence with "and so are". To edit the question, click the "Edit" button below it.

slelievre gravatar imageslelievre ( 2020-07-21 09:58:37 +0100 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2020-07-14 18:24:31 +0100

Seen: 215 times

Last updated: Jul 14 '20