python pandas dataframe slicing by date conditions

I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing dates in index. For example, this works:

>>> data
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2010-12-31 00:00:00 to 2010-04-01 00:00:00
Data columns:
Adj Close    252  non-null values
dtypes: float64(1)

>>> st = datetime.datetime(2010, 12, 31, 0, 0)
>>> en = datetime.datetime(2010, 12, 28, 0, 0)

>>> data[st:en]
            Adj Close
Date                 
2010-12-31     593.97
2010-12-30     598.86
2010-12-29     601.00
2010-12-28     598.92

However if I use a start or end date that is not present in the DF, I get python KeyError.

My Question : How do I query the dataframe object for a date range; even when the start and end dates are not present in the DataFrame. Does pandas allow for range based slicing?

I am using pandas version 0.10.1

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Use searchsorted to find the nearest times first, and then use it to slice.

In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])

In [16]: df
Out[16]: 
            0
2013-01-01  1
2013-01-03  2
2013-01-05  3

In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))

In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))

In [24]: df.iloc[start:end]
Out[24]: 
            0
2013-01-03  2

Method 2

Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

Yes, you can slice using datetimes not present in the DataFrame. For example:

In [12]: df
Out[12]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392
2013-04-23  0.924535
2013-04-24  0.531902
2013-04-25 -0.957936

In [13]: df['20130419':'20130422']
Out[13]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392

As you can see, you don’t even have to build datetime objects; strings work.

Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here…

In [17]: df
Out[17]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392
2013-04-23  0.924535
2013-04-21  0.531902
2013-04-25 -0.957936

…and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

In [18]: df['20130419':'20130422']
Out[18]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392

This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

df.sort_index()

Method 3

You can use a simple mask to accomplish this:

date_mask = (data.index > start) & (data.index < end)
dates = data.index[date_mask]
data.ix[dates]

By the way, this works for hierarchical indexing as well. In that case data.index would be replaced with data.index.levels[0] or similar.

Method 4

I had difficulty with other approaches but I found that the following approach worked for me:

# Set the Index to be the Date
df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
df.set_index('Date', inplace=True)

# Sort the Data
df = df.sort_values('Date_1')

# Slice the Data
From = '2017-05-07'
To   = '2017-06-07'
df_Z = df.loc[From:To,:]

Method 5

target_date = pd.Timestamp('2020-05-10')

df[df['DATE'].dt.date == (dt.date(target_date.year, target_date.month, target_date.day))]

This line will let you select all the dates that matches the condition, in case you don’t neet to set “Date Column” as index.

Method 6

Given a dataframe df where the date to query on is in the index, you can use query:

In [5]: df = pd.DataFrame({'Close': {'2010-12-31': 593.97, '2010-12-30': 598.86, '2010-12-29': 601.0, '2010-12-28': 598.92}})

In [6]: df
Out[6]: 
             Close
Date              
2010-12-31  593.97
2010-12-30  598.86
2010-12-29  601.00
2010-12-28  598.92

In [7]: df.query('"2010-12-01" < index <= "2010-12-29"')
Out[7]: 
             Close
Date              
2010-12-29  601.00
2010-12-28  598.92

The start and/or end date don’t need to be present in the index of df.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating