How to groupby consecutive values in pandas DataFrame

I have a column in a DataFrame with values:

[1, 1, -1, 1, -1, -1]

How can I group them like this?

[1,1] [-1] [1] [-1, -1]

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use groupby by custom Series:

df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})
print (df)
   a
0  1
1  1
2 -1
3  1
4 -1
5 -1

print ((df.a != df.a.shift()).cumsum())
0    1
1    1
2    2
3    3
4    4
5    4
Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):
    print (i)
    print (g)
    print (g.a.tolist())

   a
0  1
1  1
[1, 1]
2
   a
2 -1
[-1]
3
   a
3  1
[1]
4
   a
4 -1
5 -1
[-1, -1]

Method 2

Using groupby from itertools data from Jez

from itertools import groupby
[ list(group) for key, group in groupby(df.a.values.tolist())]
Out[361]: [[1, 1], [-1], [1], [-1, -1]]

Method 3

Series.diff is another way to mark the group boundaries (a!=a.shift means a.diff!=0):

consecutives = df['a'].diff().ne(0).cumsum()

# 0    1
# 1    1
# 2    2
# 3    3
# 4    4
# 5    4
# Name: a, dtype: int64

And to turn these groups into a Series of lists (see the other answers for a list of lists), aggregate with groupby.agg or groupby.apply:

df['a'].groupby(consecutives).agg(list)

# a
# 1      [1, 1]
# 2        [-1]
# 3         [1]
# 4    [-1, -1]
# Name: a, dtype: object

Method 4

If you are dealing with string values:

s = pd.DataFrame(['A','A','A','BB','BB','CC','A','A','BB'], columns=['a'])
string_groups = sum([['%s_%s' % (i,n) for i in g] for n,(k,g) in enumerate(itertools.groupby(s.a))],[])

>>> string_groups 
['A_0', 'A_0', 'A_0', 'BB_1', 'BB_1', 'CC_2', 'A_3', 'A_3', 'BB_4']

grouped = s.groupby(string_groups, sort=False).agg(list)
grouped.index = grouped.index.str.split('_').str[0]

>>> grouped
            a
A   [A, A, A]
BB   [BB, BB]
CC       [CC]
A      [A, A]
BB       [BB]

As a separate function:

def groupby_consec(df, col):
    string_groups = sum([['%s_%s' % (i, n) for i in g]
                         for n, (k, g) in enumerate(itertools.groupby(df[col]))], [])
    return df.groupby(string_groups, sort=False)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x