Pandas converting row with unix timestamp (in milliseconds) to datetime

I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.

This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I’m sure it can be done when creating the DataFrame?

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

data = 'RUN,UNIXTIME,VALUEn1,1447160702320,10n2,1447160702364,20n3,1447160722364,42'

df = pd.read_csv(StringIO(data))

convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)

This will pick the column ‘UNIXTIME’ and change it from

0    1447160702320
1    1447160702364
2    1447160722364
Name: UNIXTIME, dtype: int64

into this

0   2015-11-10 14:05:02.320
1   2015-11-10 14:05:02.364
2   2015-11-10 14:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

However, I would like to use something like pd.apply() to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can do this as a post processing step using to_datetime and passing arg unit='ms':

In [5]:
df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
df

Out[5]:
   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

Method 2

I use the @EdChum solution, but I add the timezone management:

df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))
                 .tz_localize('UTC' )
                 .tz_convert('America/New_York')

the tz_localize indicates that timestamp should be considered as regarding ‘UTC’, then the tz_convert actually moves the date/time to the correct timezone (in this case `America/New_York’).

Note that it has been converted to a DatetimeIndex because the tz_ methods works only on the index of the series. Since Pandas 0.15 one can use .dt:

df['UNIXTIME']=pd.to_datetime(df['UNIXTIME'], unit='ms')
                 .dt.tz_localize('UTC' )
                 .dt.tz_convert('America/New_York')

Method 3

I came up with a solution I guess:

convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)

df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)

I’m still not sure if this is the best one though.

Method 4

if you know the timestamp unit, use Series.astype:

df['UNIXTIME'].astype('datetime64[ms]')

0   2015-11-10 13:05:02.320
1   2015-11-10 13:05:02.364
2   2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

To return the entire DataFrame, use

df.astype({'UNIXTIME': 'datetime64[ms]'})

   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating