I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.
This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I’m sure it can be done when creating the DataFrame?
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
data = 'RUN,UNIXTIME,VALUEn1,1447160702320,10n2,1447160702364,20n3,1447160722364,42'
df = pd.read_csv(StringIO(data))
convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)
This will pick the column ‘UNIXTIME’ and change it from
0 1447160702320 1 1447160702364 2 1447160722364 Name: UNIXTIME, dtype: int64
into this
0 2015-11-10 14:05:02.320 1 2015-11-10 14:05:02.364 2 2015-11-10 14:05:22.364 Name: UNIXTIME, dtype: datetime64[ns]
However, I would like to use something like pd.apply() to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can do this as a post processing step using to_datetime and passing arg unit='ms':
In [5]: df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms') df Out[5]: RUN UNIXTIME VALUE 0 1 2015-11-10 13:05:02.320 10 1 2 2015-11-10 13:05:02.364 20 2 3 2015-11-10 13:05:22.364 42
Method 2
I use the @EdChum solution, but I add the timezone management:
df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))
.tz_localize('UTC' )
.tz_convert('America/New_York')
the tz_localize indicates that timestamp should be considered as regarding ‘UTC’, then the tz_convert actually moves the date/time to the correct timezone (in this case `America/New_York’).
Note that it has been converted to a DatetimeIndex because the tz_ methods works only on the index of the series. Since Pandas 0.15 one can use .dt:
df['UNIXTIME']=pd.to_datetime(df['UNIXTIME'], unit='ms')
.dt.tz_localize('UTC' )
.dt.tz_convert('America/New_York')
Method 3
I came up with a solution I guess:
convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3) df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)
I’m still not sure if this is the best one though.
Method 4
if you know the timestamp unit, use Series.astype:
df['UNIXTIME'].astype('datetime64[ms]')
0 2015-11-10 13:05:02.320
1 2015-11-10 13:05:02.364
2 2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]
To return the entire DataFrame, use
df.astype({'UNIXTIME': 'datetime64[ms]'})
RUN UNIXTIME VALUE
0 1 2015-11-10 13:05:02.320 10
1 2 2015-11-10 13:05:02.364 20
2 3 2015-11-10 13:05:22.364 42
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0