I want to split the following dataframe based on column ZZ
df =
N0_YLDF ZZ MAT
0 6.286333 2 11.669069
1 6.317000 6 11.669069
2 6.324889 6 11.516454
3 6.320667 5 11.516454
4 6.325556 5 11.516454
5 6.359000 6 11.516454
6 6.359000 6 11.516454
7 6.361111 7 11.516454
8 6.360778 7 11.516454
9 6.361111 6 11.516454
As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
Method 2
There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).
dfs = [x for _, x in df.groupby('ZZ')]
Method 3
In R there is a dataframe method called split. This is for all the R users out there:
def split(df, group):
gb = df.groupby(group)
return [gb.get_group(x) for x in gb.groups]
Method 4
Store them in a dict, which allows you access to the group DataFrames based on the group keys.
d = dict(tuple(df.groupby('ZZ')))
d[6]
# N0_YLDF ZZ MAT
#1 6.317000 6 11.669069
#2 6.324889 6 11.516454
#5 6.359000 6 11.516454
#6 6.359000 6 11.516454
#9 6.361111 6 11.516454
If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.
d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1 6.317000
#2 6.324889
#5 6.359000
#6 6.359000
#9 6.361111
#Name: N0_YLDF, dtype: float64
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0