I import a dataframe via read_csv, but for some reason can’t extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':
date Count
6/30/2010 525
7/30/2010 136
8/31/2010 125
9/30/2010 84
10/29/2010 4469
df = pd.read_csv('sample_data.csv', parse_dates=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].year
df['month'] = df['date'].month
UPDATE:
and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get “AttributeError: ‘Series’ object has no attribute ‘dt’ “:
df = pd.read_csv('sample_data.csv',parse_dates=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
Sorry for this question that seems repetitive – I expect the answer will make me feel like a bonehead… but I have not had any luck using answers to the similar questions on SO.
FOLLOWUP: I can’t seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I’m using Python 3.4.1 64bit.
conda update pandas conda install pandas==0.15.2 conda install -f pandas
Any ideas?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If you’re running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:
In [6]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
EDIT
It looks like you’re running an older version of pandas in which case the following would work:
In [18]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
Regarding why it didn’t parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs
In [20]: t="""date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469""" df = pd.read_csv(io.StringIO(t), sep='s+', parse_dates=[0]) df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 5 entries, 0 to 4 Data columns (total 2 columns): date 5 non-null datetime64[ns] Count 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 120.0 bytes
So if you pass param parse_dates=[0] to read_csv there shouldn’t be any need to call to_datetime on the ‘date’ column after loading.
Method 2
This works:
df['date'].dt.year
Now:
df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month
gives this data frame:
date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10
Method 3
When to use dt accessor
A common source of confusion revolves around when to use .year and when to use .dt.year.
The former is an attribute for pd.DatetimeIndex objects; the latter for pd.Series objects. Consider this dataframe:
df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))
The definition of the series and index look similar, but the pd.DataFrame constructor converts them to different types:
type(df.index) # pandas.tseries.index.DatetimeIndex type(df['Dates']) # pandas.core.series.Series
The DatetimeIndex object has a direct year attribute, while the Series object must use the dt accessor. Similarly for month:
df.index.month # array([1, 1, 1]) df['Dates'].dt.month.values # array([ 1, 10, 12], dtype=int64)
A subtle but important difference worth noting is that df.index.month gives a NumPy array, while df['Dates'].dt.month gives a Pandas series. Above, we use pd.Series.values to extract the NumPy array representation.
Method 4
Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day
df['date'] = pd.DatetimeIndex(df['date']).year
Method 5
What worked for me was upgrading pandas to latest version:
From Command Line do:
conda update pandas
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0