I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.
data index 2017-02-14 06:29:57 11198648 2017-02-14 06:30:01 11198650 2017-02-14 06:37:22 11198706 2017-02-14 23:11:13 11207728 2017-02-14 23:21:43 11207774 2017-02-14 23:22:36 11207776
I am using:
df.groupby(pd.TimeGrouper(freq='60Min'))
I get this grouping:
data index 2017-02-14 06:00:00 x1 2017-02-14 07:00:00 x2 2017-02-14 08:00:00 x3 2017-02-14 09:00:00 x4 2017-02-14 10:00:00 x5
but I am looking for this result:
data index 2017-02-14 06:30:00 x1 2017-02-14 07:30:00 x2 2017-02-14 08:30:00 x3 2017-02-14 09:30:00 x4 2017-02-14 10:30:00 x5
How can I tell the function to start grouping at 6:30 at one-hour intervals?
If it can not be done by the .groupby(pd.TimeGrouper(freq=’60Min’)), how is the best way to do it?
A salute and thanks very much in advance
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use base=30 in conjunction with label='right' parameters in pd.Grouper.
Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30.
Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.
Suppose, you want to aggregate the first element of every sub-group, then:
df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()
yields:
data index 2017-02-14 06:30:00 11198648.0 2017-02-14 07:30:00 11198650.0 2017-02-14 08:30:00 NaN 2017-02-14 09:30:00 NaN 2017-02-14 10:30:00 NaN 2017-02-14 11:30:00 NaN 2017-02-14 12:30:00 NaN 2017-02-14 13:30:00 NaN 2017-02-14 14:30:00 NaN 2017-02-14 15:30:00 NaN 2017-02-14 16:30:00 NaN 2017-02-14 17:30:00 NaN 2017-02-14 18:30:00 NaN 2017-02-14 19:30:00 NaN 2017-02-14 20:30:00 NaN 2017-02-14 21:30:00 NaN 2017-02-14 22:30:00 NaN 2017-02-14 23:30:00 11207728.0
Method 2
Using DataFrame.resample which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy and pd.Grouper:
df.resample('60min', base=30, label='right').first()
Output
data index 2017-02-14 06:30:00 11198648.0 2017-02-14 07:30:00 11198650.0 2017-02-14 08:30:00 NaN 2017-02-14 09:30:00 NaN 2017-02-14 10:30:00 NaN 2017-02-14 11:30:00 NaN 2017-02-14 12:30:00 NaN 2017-02-14 13:30:00 NaN 2017-02-14 14:30:00 NaN 2017-02-14 15:30:00 NaN 2017-02-14 16:30:00 NaN 2017-02-14 17:30:00 NaN 2017-02-14 18:30:00 NaN 2017-02-14 19:30:00 NaN 2017-02-14 20:30:00 NaN 2017-02-14 21:30:00 NaN 2017-02-14 22:30:00 NaN 2017-02-14 23:30:00 11207728.0
Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:
df.resample('60min', base=30, label='right')['data'].first()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0