I’m having this data frame:
Name Date Quantity Apple 07/11/17 20 orange 07/14/17 20 Apple 07/14/17 70 Orange 07/25/17 40 Apple 07/20/17 30
I want to aggregate this by Name and Date to get sum of quantities
Details:
Date: Group, the result should be at the beginning of the week (or just on Monday)
Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)
The desired output is given below:
Name Date Quantity Apple 07/10/17 90 orange 07/10/17 20 Apple 07/17/17 30 orange 07/24/17 40
Thanks in advance
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
First convert column date to_datetime and substract one week, as we want to sum for the week ahead of the date, not the week before that date.
Then use groupby with Grouper by W-MON and aggregate sum:
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
Name Date Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40
Method 2
Let’s use groupby, resample with W-Mon, and sum:
df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')
Output:
Name Date Quantity 0 Apple 2017-07-17 90 3 orange 2017-07-17 20 1 Apple 2017-07-24 30 2 Orange 2017-07-31 40
Method 3
First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in
df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()
Output:
name date
apple 28 90
29 30
orange 28 20
30 40
Method 4
This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:
import pandas as pd, datetime as dt # df = ... df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1) perweek = df['WeekDate'].groupby(df['WeekDate']).count()
Exemple:
Date WeekDate 2020-06-20 2020-06-15 <- monday 2020-06-21 2020-06-15 2020-06-24 2020-06-22 <- monday 2020-06-25 2020-06-22 2020-06-26 2020-06-22
Method 5
You already received a lot of good answer and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.
Given the dataframe you proposed:
Name Date Quantity Apple 07/11/17 20 orange 07/14/17 20 Apple 07/14/17 70 Orange 07/25/17 40 Apple 07/20/17 30
We have to convert the values in ‘Date’ as Pandas’ Datetime since they are strings right now.
Then we can use the Series’ dt property that allow us to handle DateTime-like series and extract informations.
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')
By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date.
In order to do not loose any information I prefer to add a new column with the week number.
Once retrieved the number of the week we can group by that week.
df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()
Name WeekNumber
Apple 28 90
29 30
Orange 28 20
30 40
Small problem: what if we consider different years?
There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().
df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()
Name WeekNumber year Quantity
Apple 28 2017 90
29 2017 30
Orange 28 2017 20
30 2017 40
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0