A pandas DataFrame column duration contains timedelta64[ns] as shown. How can you convert them to seconds?
0 00:20:32 1 00:23:10 2 00:24:55 3 00:13:17 4 00:18:52 Name: duration, dtype: timedelta64[ns]
I tried the following
print df[:5]['duration'] / np.timedelta64(1, 's')
but got the error
Traceback (most recent call last):
File "test.py", line 16, in <module>
print df[0:5]['duration'] / np.timedelta64(1, 's')
File "C:Python27libsite-packagespandascoreseries.py", line 130, in wrapper
"addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
Also tried
print df[:5]['duration'].astype('timedelta64[s]')
but received the error
Traceback (most recent call last):
File "test.py", line 17, in <module>
print df[:5]['duration'].astype('timedelta64[s]')
File "C:Python27libsite-packagespandascoreseries.py", line 934, in astype
values = com._astype_nansafe(self.values, dtype)
File "C:Python27libsite-packagespandascorecommon.py", line 1653, in _astype_nansafe
raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This works properly in the current version of Pandas (version 0.14):
In [132]: df[:5]['duration'] / np.timedelta64(1, 's') Out[132]: 0 1232 1 1390 2 1495 3 797 4 1132 Name: duration, dtype: float64
Here is a workaround for older versions of Pandas/NumPy:
In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype
'<i8'). So the above views the timedelta64s as 8-byte ints and then does integer
division to convert nanoseconds to seconds.
Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.
Method 2
Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.
>>> s 0 -1 days +23:45:14.304000 1 -1 days +23:46:57.132000 2 -1 days +23:49:25.913000 3 -1 days +23:59:48.913000 4 00:00:00.820000 dtype: timedelta64[ns] >>> >>> s.dt.total_seconds() 0 -885.696 1 -782.868 2 -634.087 3 -11.087 4 0.820 dtype: float64
There are other Pandas Series Accessors for String, Categorical, and Sparse data types.
Method 3
Just realized it’s an old thread, anyway leaving it here if wanderers
like me clicks only on top 5 results on the search engine and ends up
here.
Make sure that your types are correct.
-
If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.
-
- hours – hours x 3600 = seconds
-
- minutes – minutes x 60 = seconds
-
- seconds – seconds
-
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
- If you want to convert timedelta to seconds use the one bellow.
linear_df[:5]['duration'].astype('timedelta64[s]')
I got it to work like this:
start_dt and end_dt columns are in this format:
import datetime linear_df[:5]['start_dt'] 0 1970-02-22 21:32:48.000 1 2016-12-30 17:47:33.216 2 2016-12-31 09:33:27.931 3 2016-12-31 09:52:53.486 4 2016-12-31 10:29:44.611 Name: start_dt, dtype: datetime64[ns]
Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.
linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']
Resulted duration column look like this
linear_df[:5]['duration'] 0 0 days 00:00:14 1 2 days 17:44:50.558000 2 0 days 15:37:28.418000 3 0 days 18:45:45.727000 4 0 days 19:21:27.159000 Name: duration, dtype: timedelta64[ns]
Using pandas I had my duration seconds between two dates in float. Easier to compare or filter your duration afterwards.
linear_df[:5]['duration'].astype('timedelta64[s]')
0 14.0
1 236690.0
2 56248.0
3 67545.0
4 69687.0
Name: duration, dtype: float64
In my case if I want to get all duration which is more than 1 second.
Hope it helps.
Method 4
Use the ‘total_seconds()’ function :
df['durationSeconds'] = df['duration'].dt.total_seconds()
Method 5
We can simply use the pandas apply() function
def get_seconds(time_delta):
return time_delta.seconds
def get_microseconds(time_delta):
return time_delta.micro_seconds
time_delta_series = df['duration']
converted_series = time_delta_series.apply(get_seconds)
print(converted_series)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0