How do I access the corresponding groupby dataframe in a groupby object by the key?
With the following groupby:
rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
'B': rand.randn(6),
'C': rand.randint(0, 20, 6)})
gb = df.groupby(['A'])
I can iterate through it to get the keys and groups:
In [11]: for k, gp in gb:
print 'key=' + str(k)
print gp
key=bar
A B C
1 bar -0.611756 18
3 bar -1.072969 10
5 bar -2.301539 18
key=foo
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
I would like to be able to access a group by its key:
In [12]: gb['foo']
Out[12]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
But when I try doing that with gb[('foo',)] I get this weird pandas.core.groupby.DataFrameGroupBy object thing which doesn’t seem to have any methods that correspond to the DataFrame I want.
The best I could think of is:
In [13]: def gb_df_key(gb, key, orig_df):
ix = gb.indices[key]
return orig_df.ix[ix]
gb_df_key(gb, 'foo', df)
Out[13]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
but this is kind of nasty, considering how nice pandas usually is at these things.
What’s the built-in way of doing this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use the get_group method:
In [21]: gb.get_group('foo')
Out[21]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
Note: This doesn’t require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb)). This is because it uses data-structures already available in the groupby object.
You can select different columns using the groupby slicing:
In [22]: gb[["A", "B"]].get_group("foo")
Out[22]:
A B
0 foo 1.624345
2 foo -0.528172
4 foo 0.865408
In [23]: gb["C"].get_group("foo")
Out[23]:
0 5
2 11
4 14
Name: C, dtype: int64
Method 2
Wes McKinney (pandas’ author) in Python for Data Analysis provides the following recipe:
groups = dict(list(gb))
which returns a dictionary whose keys are your group labels and whose values are DataFrames, i.e.
groups['foo']
will yield what you are looking for:
A B C 0 foo 1.624345 5 2 foo -0.528172 11 4 foo 0.865408 14
Method 3
Rather than
gb.get_group('foo')
I prefer using gb.groups
df.loc[gb.groups['foo']]
Because in this way you can choose multiple columns as well. for example:
df.loc[gb.groups['foo'],('A','B')]
Method 4
gb = df.groupby(['A']) gb_groups = grouped_df.groups
If you are looking for selective groupby objects then, do: gb_groups.keys(), and input desired key into the following key_list..
gb_groups.keys()
key_list = [key1, key2, key3 and so on...]
for key, values in gb_groups.items():
if key in key_list:
print(df.ix[values], "n")
Method 5
I was looking for a way to sample a few members of the GroupBy obj – had to address the posted question to get this done.
create groupby object based on some_key column
grouped = df.groupby('some_key')
pick N dataframes and grab their indices
sampled_df_i = random.sample(grouped.indices, N)
grab the groups
df_list = map(lambda df_i: grouped.get_group(df_i), sampled_df_i)
optionally – turn it all back into a single dataframe object
sampled_df = pd.concat(df_list, axis=0, join='outer')
Method 6
df.groupby('A').get_group('foo')
is equivalent to:
df[df['A'] == 'foo']
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0