I have a column in a DataFrame with values:
[1, 1, -1, 1, -1, -1]
How can I group them like this?
[1,1] [-1] [1] [-1, -1]
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use groupby by custom Series:
df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})
print (df)
a
0 1
1 1
2 -1
3 1
4 -1
5 -1
print ((df.a != df.a.shift()).cumsum())
0 1
1 1
2 2
3 3
4 4
5 4
Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):
print (i)
print (g)
print (g.a.tolist())
a
0 1
1 1
[1, 1]
2
a
2 -1
[-1]
3
a
3 1
[1]
4
a
4 -1
5 -1
[-1, -1]
Method 2
Using groupby from itertools data from Jez
from itertools import groupby [ list(group) for key, group in groupby(df.a.values.tolist())] Out[361]: [[1, 1], [-1], [1], [-1, -1]]
Method 3
Series.diff is another way to mark the group boundaries (a!=a.shift means a.diff!=0):
consecutives = df['a'].diff().ne(0).cumsum()
# 0 1
# 1 1
# 2 2
# 3 3
# 4 4
# 5 4
# Name: a, dtype: int64
And to turn these groups into a Series of lists (see the other answers for a list of lists), aggregate with groupby.agg or groupby.apply:
df['a'].groupby(consecutives).agg(list)
# a
# 1 [1, 1]
# 2 [-1]
# 3 [1]
# 4 [-1, -1]
# Name: a, dtype: object
Method 4
If you are dealing with string values:
s = pd.DataFrame(['A','A','A','BB','BB','CC','A','A','BB'], columns=['a'])
string_groups = sum([['%s_%s' % (i,n) for i in g] for n,(k,g) in enumerate(itertools.groupby(s.a))],[])
>>> string_groups
['A_0', 'A_0', 'A_0', 'BB_1', 'BB_1', 'CC_2', 'A_3', 'A_3', 'BB_4']
grouped = s.groupby(string_groups, sort=False).agg(list)
grouped.index = grouped.index.str.split('_').str[0]
>>> grouped
a
A [A, A, A]
BB [BB, BB]
CC [CC]
A [A, A]
BB [BB]
As a separate function:
def groupby_consec(df, col):
string_groups = sum([['%s_%s' % (i, n) for i in g]
for n, (k, g) in enumerate(itertools.groupby(df[col]))], [])
return df.groupby(string_groups, sort=False)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0