Pandas: Converting to numeric, creating NaNs when necessary

Say I have a column in a dataframe that has some numbers and some non-numbers

>> df['foo']
0       0.0
1     103.8
2     751.1
3       0.0
4       0.0
5         -
6         -
7       0.0
8         -
9       0.0
Name: foo, Length: 9, dtype: object

How can I convert this column to np.float, and have everything else that is not float convert it to NaN?

When I try:

>> df['foo'].astype(np.float)

or

>> df['foo'].apply(np.float)

I get ValueError: could not convert string to float: -

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

In pandas 0.17.0 convert_objects raises a warning:

FutureWarning: convert_objects is deprecated. Use the data-type
specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

You could use pd.to_numeric method and apply it for the dataframe with arg coerce.

df1 = df.apply(pd.to_numeric, args=('coerce',))

or maybe more appropriately:

df1 = df.apply(pd.to_numeric, errors='coerce')

EDIT

The above method is only valid for pandas version >= 0.17.0, from docs what’s new in pandas 0.17.0:

pd.to_numeric is a new function to coerce strings to numbers (possibly with coercion) (GH11133)

Method 2

Use the convert_objects Series method (and convert_numeric):

In [11]: s
Out[11]: 
0    103.8
1    751.1
2      0.0
3      0.0
4        -
5        -
6      0.0
7        -
8      0.0
dtype: object

In [12]: s.convert_objects(convert_numeric=True)
Out[12]: 
0    103.8
1    751.1
2      0.0
3      0.0
4      NaN
5      NaN
6      0.0
7      NaN
8      0.0
dtype: float64

Note: this is also available as a DataFrame method.

Method 3

You can simply use pd.to_numeric and setting error to coerce without using apply

df['foo'] = pd.to_numeric(df['foo'], errors='coerce')

Method 4

First replace all the string values with None, to mark them as missing values and then convert it to float.

df['foo'][df['foo'] == '-'] = None
df['foo'] = df['foo'].astype(float)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x