Count consecutive characters

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do something like:

word = '1000'

counter = 0
print range(len(word))

for i in range(len(word) - 1):
    while word[i] == word[i + 1]:
        counter += 1
        print counter * "0"
    else:
        counter = 1
        print counter * "1"

So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.

In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.

How could I do this with just built-in functions?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Consecutive counts:

You can use itertools.groupby:

s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

After which, result looks like:

[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]

And you could format with something like:

", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"

Total counts:

Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}

Your method:

As many have pointed out, your method fails because you’re looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it’s more pythonic to generate something to iterate over.

For string that’s not absolutely huge, zip(s, s[1:]) isn’t a a performance issue, so you could do:

counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

The only problem being that you’ll have to special-case the last character if it’s unique. That can be fixed with itertools.zip_longest

import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

If you do have a truly huge string and can’t stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.

def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...

Method 2

A solution “that way”, with only basic statements:

word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
    for i in range(1,len(word)):
       if word[i-1]==word[i]:
          count+=1
       else :
           length += word[i-1]+" repeats "+str(count)+", "
           count=1
    length += ("and "+word[i]+" repeats "+str(count))
else:
    i=0
    length += ("and "+word[i]+" repeats "+str(count))
print (length)

Output :

'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'

Method 3

Totals (without sub-groupings)

#!/usr/bin/python3 -B

charseq = 'abbcccdddd'
distros = { c:1 for c in charseq  }

for c in range(len(charseq)-1):
    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1

print(distros)

I’ll provide a brief explanation for the interesting lines.

distros = { c:1 for c in charseq  }

The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

Then comes the loop:

for c in range(len(charseq)-1):

We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop’s body.

if charseq[c] == charseq[c+1]:
    distros[charseq[c]] += 1

At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):

# replacing vars for their values
if charseq[1] == charseq[1+1]:
    distros[charseq[1]] += 1

# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
    distros['b'] += 1

You can see the program output below with the correct counts:

➜  /tmp  ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}

Method 4

You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False‘s value is 0 and True‘s value is 1 with sum:

sum(word[i] == word[i+1] for i in range(len(word)-1))

This produces the sum of (False, True, True, False) where False is 0 and True is 1 – which is what you’re after.

If you want this to be safe you need to guard empty words (index -1 access):

sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))

And this can be improved with zip:

sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))

Method 5

If we want to count consecutive characters without looping, we can make use of pandas:

In [1]: import pandas as pd

In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))

In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]

The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:

In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))

In [7]: d.ne(d.shift())
Out[7]:
0     True
1     True
2    False
3     True
dtype: bool

In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0    1
1    2
2    2
3    3
dtype: int32

Method 6

This is my simple code for finding maximum number of consecutive 1’s in binaray string in python 3:

count= 0
maxcount = 0
for i in str(bin(13)):
    if i == '1':
        count +=1
    elif count > maxcount:
        maxcount = count;
        count = 0
    else:
        count = 0
if count > maxcount: maxcount = count        
maxcount

Method 7

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]

print(dw)  # digits
['1', '0', '2', '3', '4']
print(cw)  # counts
[3, 3, 3, 2, 2, 5]

w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw)  # characters
print(cw)  # digits

['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

Method 8

A one liner that returns the amount of consecutive characters with no imports:

def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}

That returns the amount of times any repeated character in a list is in a consecutive run of characters.

alternatively, this accomplishes the same thing, albeit much slower:

def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}

In terms of performance, I ran them with

site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))

which resulted in:

12.528256356999918
5.351301653001428

This method can definitely be improved, but without using any external libraries, this was the best I could come up with.

Method 9

In python

your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
    current = loop
    count = count + 1
    if index == len(your_string)-1:
        print(f"{count}{current}", end ='')
        break

    if your_string[index+1] != current:
        print(f"{count}{current}",end ='')
        count = 0
        continue

This will output

5w1e4a2w4b1n

Method 10

#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1  #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
    if s[i-1]==s[i]:
        count = count+1
    else:
        l.append(count)
        count=1
    if i == len(s)-1: #To check the last character sequence we need loop reverse order
        reverse_count=1
        for i in range(-1,-(len(s)),-1): #Lopping only for last character
            if s[i] == s[i-1]:
                reverse_count = reverse_count+1
            else:
                l.append(reverse_count)
                break
print(l)

Method 11

Today I had an interview and was asked the same question. I was struggling with the original solution in mind:

s = 'abbcccda'

old = ''
cnt = 0
res = ''
for c in s:
    cnt += 1
    if old != c:
        res += f'{old}{cnt}'
        old = c
        cnt = 0  # default 0 or 1 neither work
print(res)
#  1a1b2c3d1

Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.

After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).

s = 'abbcccda'

olds = []
for c in s:
    if olds and c in olds[-1]:
        olds[-1].append(c)
    else:
        olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)

#  [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
#  a1b2c3d1a1

Method 12

Here is my simple solution:

def count_chars(s):
    size = len(s)
    count = 1
    op = ''
    for i in range(1, size):
        if s[i] == s[i-1]:
            count += 1
        else:
            op += "{}{}".format(count, s[i-1])
            count = 1
    if size:
        op += "{}{}".format(count, s[size-1])

    return op


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x