I have the following pandas dataframe:
token year uses books 386 xanthos 1830 3 3 387 xanthos 1840 1 1 388 xanthos 1840 2 2 389 xanthos 1868 2 2 390 xanthos 1875 1 1
I aggregate the rows with duplicate token and years like so:
dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum])
dfalph.columns = dfalph.columns.droplevel(1)
uses books
token year
xanthos 1830 3 3
1840 3 3
1867 2 2
1868 2 2
1875 1 1
Instead of having the ‘token’ and ‘year’ fields in the index, I would like to return them to columns and have an integer index.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Method #1: reset_index()
>>> g
uses books
sum sum
token year
xanthos 1830 3 3
1840 3 3
1868 2 2
1875 1 1
[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
token year uses books
sum sum
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
Method #2: don’t make the index in the first place, using as_index=False
>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
token year uses books
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
Method 2
I defer form the accepted answer.
While there are 2 ways to do this, these will not necessarily result in same output. Specially when you are using Grouper in groupby
index=Falsereset_index()
example df
+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 26-10-2018 | 2 | | B | M | 28-10-2018 | 3 | | A | M | 30-10-2018 | 6 | | B | M | 01-11-2018 | 3 | | C | N | 03-11-2018 | 4 | +---------+---------+-------------+------------+
They do not work the same way.
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
],
as_index=False
).sum()
The above will give
+---------+---------+------------+ | column1 | column2 | column_sum | +---------+---------+------------+ | A | M | 8 | | B | M | 3 | | B | M | 3 | | C | N | 4 | +---------+---------+------------+
While,
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index()
Will give
+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 31-10-2018 | 8 | | B | M | 31-10-2018 | 3 | | B | M | 30-11-2018 | 3 | | C | N | 30-11-2018 | 4 | +---------+---------+-------------+------------+
Method 3
You need to add drop=True:
df.reset_index(drop=True)
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index(drop=True)
Method 4
If you have the MultiIndex and want to reset only a specific index level you can use the parameter level in reset_index. For example:
index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')], names=['A', 'B'])
s = pd.DataFrame(np.arange(1.0, 5.0), index=index, columns=['C'])
C
A B
one a 1.0
b 2.0
two a 3.0
b 4.0
Reset the first level:
df.reset_index(level=0)
Output:
A C B a one 1.0 b one 2.0 a two 3.0 b two 4.0
Reset the second level:
df.reset_index(level=1)
Output:
B C A one a 1.0 one b 2.0 two a 3.0 two b 4.0
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0