I have a df that looks like the following:
id item color 01 truck red 02 truck red 03 car black 04 truck blue 05 car black
I am trying to create a df that looks like this:
item color count truck red 2 truck blue 1 car black 2
I have tried
df["count"] = df.groupby("item")["color"].transform('count')
But it is not quite what I am searching for.
Any guidance is appreciated
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
That’s not a new column, that’s a new DataFrame:
In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2
To get the result you want is to use reset_index:
In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2
To get a “new column” you could use transform:
In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64
I recommend reading the split-apply-combine section of the docs.
Method 2
Another possible way to achieve the desired output would be to use Named Aggregation. Which will allow you to specify the name and respective aggregation function for the desired output columns.
Named aggregation
(New in version 0.25.0.)
To support column-specific aggregation with control over the output
column names, pandas accepts the special syntax inGroupBy.agg(),
known as “named aggregation”, where:
- The keywords are the output column names
- The values are tuples whose first element is the column to select and
the second element is the aggregation to apply to that column. Pandas
provides thepandas.NamedAggnamed tuple with the fields['column','aggfunc']to make it clearer what the arguments are. As usual, the
aggregation can be a callable or a string alias.
So to get the desired output – you could try something like…
import pandas as pd
# Setup
df = pd.DataFrame([
{
"item":"truck",
"color":"red"
},
{
"item":"truck",
"color":"red"
},
{
"item":"car",
"color":"black"
},
{
"item":"truck",
"color":"blue"
},
{
"item":"car",
"color":"black"
}
])
df_grouped = df.groupby(["item", "color"]).agg(
count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)
Which produces the following output:
count_col
item color
car black 2
truck blue 1
red 2
Method 3
Here is another option:
import numpy as np df['Counts'] = np.zeros(len(df)) grp_df = df.groupby(['item', 'color']).count()
which results in
Counts
item color
car black 2
truck blue 1
red 2
Method 4
You can use value_counts and name the column with reset_index:
In [3]: df[['item', 'color']].value_counts().reset_index(name='counts')
Out[3]:
item color counts
0 car black 2
1 truck red 2
2 truck blue 1
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0