I’m dealing with pandas dataframe and have a frame like this:
Year Value 2012 10 2013 20 2013 25 2014 30
I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:
Year Value Rank
2012 10 1
2013 20 2
2013 25 2
2014 30 3
How can it be done in pandas?
Thanks!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use pd.Series.rank with method='dense'
df['Rank'] = df.Year.rank(method='dense').astype(int) df
Method 2
The fastest solution is factorize:
df['Rank'] = pd.factorize(df.Year)[0] + 1
Timings:
#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)
In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop
In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop
In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 µs per loop
Method 3
You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).
df['Rank'] = df.Year.astype('category').cat.codes + 1
>>> df
Year Value Rank
0 2012 10 1
1 2013 20 2
2 2013 25 2
3 2014 30 3
Method 4
Groupby.ngroup
Will sort keys by default so smaller years get labeled lower. Can set sort=False to rank groups based on order of occurrence.
df['Rank'] = df.groupby('Year', sort=True).ngroup()+1
np.unique
Also sorts, so use return_inverse to rank the smaller values lowest.
df['Rank'] = np.unique(df['Year'], return_inverse=True)[1]+1
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
