I tried to use groupby to group rows with multiple values.
col val
A Cat
A Tiger
B Ball
B Bat
import pandas as pd
df = pd.read_csv("Inputfile.txt", sep='t')
group = df.groupby(['col'])['val'].sum()
I got
A CatTiger B BallBat
I want to introduce a delimiter, so that my output looks like
A Cat-Tiger B Ball-Bat
I tried,
group = df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x))
this yielded,
A C-a-t-T-i-g-e-r B B-a-l-l-B-a-t
What is the issue here ?
Thanks,
AP
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Alternatively you can do it this way:
In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object
UPDATE: answering question from the comment:
In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg
In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object
Last for convert index or MultiIndex to columns:
df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')
Method 2
just try
group = df.groupby(['col'])['val'].apply(lambda x: '-'.join(x))
Method 3
You can first aggregate to list and then join with str.join:
df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2], 'B': ['a', 'b', 'c', 'd', 'e', 'f']})
df.groupby('A')['B'].agg(list).str.join('-')
Output:
A 1 a-b-c 2 d-e-f Name: B, dtype: object
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0