Say I have a column in a dataframe that has some numbers and some non-numbers
>> df['foo'] 0 0.0 1 103.8 2 751.1 3 0.0 4 0.0 5 - 6 - 7 0.0 8 - 9 0.0 Name: foo, Length: 9, dtype: object
How can I convert this column to np.float, and have everything else that is not float convert it to NaN?
When I try:
>> df['foo'].astype(np.float)
or
>> df['foo'].apply(np.float)
I get ValueError: could not convert string to float: -
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
In pandas 0.17.0 convert_objects raises a warning:
FutureWarning: convert_objects is deprecated. Use the data-type
specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
You could use pd.to_numeric method and apply it for the dataframe with arg coerce.
df1 = df.apply(pd.to_numeric, args=('coerce',))
or maybe more appropriately:
df1 = df.apply(pd.to_numeric, errors='coerce')
EDIT
The above method is only valid for pandas version >= 0.17.0, from docs what’s new in pandas 0.17.0:
pd.to_numeric is a new function to coerce strings to numbers (possibly with coercion) (GH11133)
Method 2
Use the convert_objects Series method (and convert_numeric):
In [11]: s Out[11]: 0 103.8 1 751.1 2 0.0 3 0.0 4 - 5 - 6 0.0 7 - 8 0.0 dtype: object In [12]: s.convert_objects(convert_numeric=True) Out[12]: 0 103.8 1 751.1 2 0.0 3 0.0 4 NaN 5 NaN 6 0.0 7 NaN 8 0.0 dtype: float64
Note: this is also available as a DataFrame method.
Method 3
You can simply use pd.to_numeric and setting error to coerce without using apply
df['foo'] = pd.to_numeric(df['foo'], errors='coerce')
Method 4
First replace all the string values with None, to mark them as missing values and then convert it to float.
df['foo'][df['foo'] == '-'] = None df['foo'] = df['foo'].astype(float)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0