Is there an easy method in pandas to invoke groupby on a range of values increments? For instance given the example below can I bin and group column B with a 0.155 increment so that for example, the first couple of groups in column B are divided into ranges between ‘0 – 0.155, 0.155 – 0.31 …`
import numpy as np
import pandas as pd
df=pd.DataFrame({'A':np.random.random(20),'B':np.random.random(20)})
A B
0 0.383493 0.250785
1 0.572949 0.139555
2 0.652391 0.401983
3 0.214145 0.696935
4 0.848551 0.516692
Alternatively I could first categorize the data by those increments into a new column and subsequently use groupby to determine any relevant statistics that may be applicable in column A?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You might be interested in pd.cut:
>>> df.groupby(pd.cut(df["B"], np.arange(0, 1.0+0.155, 0.155))).sum()
A B
B
(0, 0.155] 2.775458 0.246394
(0.155, 0.31] 1.123989 0.471618
(0.31, 0.465] 2.051814 1.882763
(0.465, 0.62] 2.277960 1.528492
(0.62, 0.775] 1.577419 2.810723
(0.775, 0.93] 0.535100 1.694955
(0.93, 1.085] NaN NaN
[7 rows x 2 columns]
Method 2
Try this:
df = df.sort_values('B')
bins = np.arange(0, 1.0, 0.155)
ind = np.digitize(df['B'], bins)
print df.groupby(ind).head()
Of course you can use any function on the groups not just head.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0