I feel like there is a better way than this:
import pandas as pd
df = pd.DataFrame(
columns=" index c1 c2 v1 ".split(),
data= [
[ 0, "A", "X", 3, ],
[ 1, "A", "X", 5, ],
[ 2, "A", "Y", 7, ],
[ 3, "A", "Y", 1, ],
[ 4, "B", "X", 3, ],
[ 5, "B", "X", 1, ],
[ 6, "B", "X", 3, ],
[ 7, "B", "Y", 1, ],
[ 8, "C", "X", 7, ],
[ 9, "C", "Y", 4, ],
[ 10, "C", "Y", 1, ],
[ 11, "C", "Y", 6, ],]).set_index("index", drop=True)
def callback(x):
x['seq'] = range(1, x.shape[0] + 1)
return x
df = df.groupby(['c1', 'c2']).apply(callback)
print df
To achieve this:
c1 c2 v1 seq 0 A X 3 1 1 A X 5 2 2 A Y 7 1 3 A Y 1 2 4 B X 3 1 5 B X 1 2 6 B X 3 3 7 B Y 1 1 8 C X 7 1 9 C Y 4 1 10 C Y 1 2 11 C Y 6 3
Is there a way to do it that avoids the callback?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
use cumcount(), see docs here
In [4]: df.groupby(['c1', 'c2']).cumcount() Out[4]: 0 0 1 1 2 0 3 1 4 0 5 1 6 2 7 0 8 0 9 0 10 1 11 2 dtype: int64
If you want orderings starting at 1
In [5]: df.groupby(['c1', 'c2']).cumcount()+1 Out[5]: 0 1 1 2 2 1 3 2 4 1 5 2 6 3 7 1 8 1 9 1 10 2 11 3 dtype: int64
Method 2
This might be useful
df = df.sort_values(['userID', 'date'])
grp = df.groupby('userID')['ItemID'].aggregate(lambda x: '->'.join(tuple(x))).reset_index()
print(grp)
it will create a sequence like this

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0