converting currency with $ to numbers in Python pandas

I have the following data in pandas dataframe:

    state        1st        2nd             3rd
0   California  $11,593,820 $109,264,246    $8,496,273
1   New York    $10,861,680 $45,336,041     $6,317,300
2   Florida     $7,942,848  $69,369,589     $4,697,244
3   Texas       $7,536,817  $61,830,712     $5,736,941

I want to perform some simple analysis (e.g., sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of those three columns is object (or string).

So I used the following code for data conversion:

data = data.convert_objects(convert_numeric=True)

But, conversion does not work, perhaps, due to the dollar sign. Any suggestion?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

@EdChum’s answer is clever and works well. But since there’s more than one way to bake a cake…. why not use regex? For example:

df[df.columns[1:]] = df[df.columns[1:]].replace('[$,]', '', regex=True).astype(float)

To me, that is a little bit more readable.

Method 2

You can also use locale as follows

import locale
import pandas as pd
locale.setlocale(locale.LC_ALL,'')
df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$')))

Note the above code was tested in Python 3 and Windows environment

Method 3

You can use the vectorised str methods to replace the unwanted characters and then cast the type to int:

In [81]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64)
df

Out[81]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

dtype change is now confirmed:

In [82]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
state    4 non-null object
1st      4 non-null int64
2nd      4 non-null int64
3rd      4 non-null int64
dtypes: int64(3), object(1)
memory usage: 160.0+ bytes

Another way:

In [108]:

df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64)
df
Out[108]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

Method 4

To convert into integer, use:

carSales["Price"] = carSales["Price"].replace("[$,]", "", regex=True).astype(int)

Method 5

You can use the methodstr.replace and the regex 'D' to remove all nondigit characters or '[^-.0-9]' to keep minus signs, decimal points and digits:

for col in df.columns[1:]:
    df[col] = pd.to_numeric(df[col].str.replace('[^-.0-9]', ''))


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x