I have the following pandas dataframe:
token year uses books 386 xanthos 1830 3 3 387 xanthos 1840 1 1 388 xanthos 1840 2 2 389 xanthos 1868 2 2 390 xanthos 1875 1 1
I aggregate the rows with duplicate token
and years
like so:
dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum]) dfalph.columns = dfalph.columns.droplevel(1) uses books token year xanthos 1830 3 3 1840 3 3 1867 2 2 1868 2 2 1875 1 1
Instead of having the ‘token’ and ‘year’ fields in the index, I would like to return them to columns and have an integer index.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Method #1: reset_index()
>>> g uses books sum sum token year xanthos 1830 3 3 1840 3 3 1868 2 2 1875 1 1 [4 rows x 2 columns] >>> g = g.reset_index() >>> g token year uses books sum sum 0 xanthos 1830 3 3 1 xanthos 1840 3 3 2 xanthos 1868 2 2 3 xanthos 1875 1 1 [4 rows x 4 columns]
Method #2: don’t make the index in the first place, using as_index=False
>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum() >>> g token year uses books 0 xanthos 1830 3 3 1 xanthos 1840 3 3 2 xanthos 1868 2 2 3 xanthos 1875 1 1 [4 rows x 4 columns]
Method 2
I defer form the accepted answer.
While there are 2 ways to do this, these will not necessarily result in same output. Specially when you are using Grouper
in groupby
index=False
reset_index()
example df
+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 26-10-2018 | 2 | | B | M | 28-10-2018 | 3 | | A | M | 30-10-2018 | 6 | | B | M | 01-11-2018 | 3 | | C | N | 03-11-2018 | 4 | +---------+---------+-------------+------------+
They do not work the same way.
df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ], as_index=False ).sum()
The above will give
+---------+---------+------------+ | column1 | column2 | column_sum | +---------+---------+------------+ | A | M | 8 | | B | M | 3 | | B | M | 3 | | C | N | 4 | +---------+---------+------------+
While,
df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ] ).sum().reset_index()
Will give
+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 31-10-2018 | 8 | | B | M | 31-10-2018 | 3 | | B | M | 30-11-2018 | 3 | | C | N | 30-11-2018 | 4 | +---------+---------+-------------+------------+
Method 3
You need to add drop=True
:
df.reset_index(drop=True) df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ] ).sum().reset_index(drop=True)
Method 4
If you have the MultiIndex
and want to reset only a specific index level you can use the parameter level
in reset_index
. For example:
index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')], names=['A', 'B']) s = pd.DataFrame(np.arange(1.0, 5.0), index=index, columns=['C']) C A B one a 1.0 b 2.0 two a 3.0 b 4.0
Reset the first level:
df.reset_index(level=0)
Output:
A C B a one 1.0 b one 2.0 a two 3.0 b two 4.0
Reset the second level:
df.reset_index(level=1)
Output:
B C A one a 1.0 one b 2.0 two a 3.0 two b 4.0
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0