I have a pandas dataframe:
import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)
df = pnd.DataFrame(index=dates, columns=['a'])
df['a'] = 6
print(df)
a
2013-01-01 16:00:00 6
2013-01-02 16:00:00 6
2013-01-03 16:00:00 6
2013-01-04 16:00:00 6
2013-01-07 16:00:00 6
2013-01-08 16:00:00 6
2013-01-09 16:00:00 6
2013-01-10 16:00:00 6
2013-01-11 16:00:00 6
I am interested in find the label location of one of the labels, say,
ds = pnd.Timestamp('2013-01-02 16:00')
Looking at the index values, I know that is integer location of this label 1. How can get pandas to tell what the integer value of this label is?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You’re looking for the index method get_loc:
In [11]: df.index.get_loc(ds) Out[11]: 1
Method 2
Get dataframe integer index given a date key:
>>> import pandas as pd
>>> df = pd.DataFrame(
index=pd.date_range(pd.datetime(2008,1,1), pd.datetime(2008,1,5)),
columns=("foo", "bar"))
>>> df["foo"] = [10,20,40,15,10]
>>> df["bar"] = [100,200,40,-50,-38]
>>> df
foo bar
2008-01-01 10 100
2008-01-02 20 200
2008-01-03 40 40
2008-01-04 15 -50
2008-01-05 10 -38
>>> df.index.get_loc(df["bar"].argmax())
1
>>> df.index.get_loc(df["foo"].argmax())
2
In column bar, the index of the maximum value is 1
In column foo, the index of the maximum value is 2
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_loc.html
Method 3
get_loc can be used for rows and columns according to:
import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)
df = pnd.DataFrame(index=dates)
df['a'] = 5
df['b'] = 6
print(df.head())
a b
2013-01-01 16:00:00 5 6
2013-01-02 16:00:00 5 6
2013-01-03 16:00:00 5 6
2013-01-04 16:00:00 5 6
2013-01-07 16:00:00 5 6
#for rows
print(df.index.get_loc('2013-01-01 16:00:00'))
0
#for columns
print(df.columns.get_loc('b'))
1
Method 4
Because get_loc returns a mask rather than a list of integer index locations when there are multiple instances of the key in the index, I was toying with an answer using reset_index():
# Add a duplicate!!!
dup = pd.Timestamp('2013-01-07 16:00')
df = df.append(pd.DataFrame([7],columns=['a'],index=[dup]))
df
a
2013-01-01 16:00:00 6
2013-01-02 16:00:00 6
2013-01-03 16:00:00 6
2013-01-04 16:00:00 6
2013-01-07 16:00:00 6
2013-01-08 16:00:00 6
2013-01-09 16:00:00 6
2013-01-10 16:00:00 6
2013-01-11 16:00:00 6
2013-01-07 16:00:00 7
2013-01-08 16:00:00 3
# Only use this method if the key has duplicates
if (df.loc[dup].index.has_duplicates):
df.reset_index().loc[df.index.get_loc(dup)].index.to_list()
array([4, 9])
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0