I’m working with the following df:
c.sort_values('2005', ascending=False).head(3)
GeoName ComponentName IndustryId IndustryClassification Description 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
37926 Alabama Real GDP by state 9 213 Support activities for mining 99 98 117 117 115 87 96 95 103 102 (NA)
37951 Alabama Real GDP by state 34 42 Wholesale trade 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199
37932 Alabama Real GDP by state 15 327 Nonmetallic mineral products manufacturing 980 968 940 1084 861 724 714 701 589 641 (NA)
I want to force numeric on all of the years:
c['2014'] = pd.to_numeric(c['2014'], errors='coerce')
is there an easy way to do this or do I have to type them all out?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
UPDATE: you don’t need to convert your values afterwards, you can do it on-the-fly when reading your CSV:
In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0) In [166]: df.dtypes Out[166]: GeoName object ComponentName object IndustryId int64 IndustryClassification object Description object 2004 int64 2005 int64 2006 int64 2007 int64 2008 int64 2009 int64 2010 int64 2011 int64 2012 int64 2013 int64 2014 float64 dtype: object
If you need to convert multiple columns to numeric dtypes – use the following technique:
Sample source DF:
In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2
In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object
Converting selected columns to numeric dtypes:
In [273]: cols = df.columns.drop('id')
In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0
In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object
PS if you want to select all string (object) columns use the following simple trick:
cols = df.columns[df.dtypes.eq('object')]
Method 2
another way is using apply, one liner:
cols = ['col1', 'col2', 'col3'] data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)
Method 3
You can use:
print df.columns[5:]
Index([u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011',
u'2012', u'2013', u'2014'],
dtype='object')
for col in df.columns[5:]:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
Another solution with filter:
print df.filter(like='20')
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
37926 99 98 117 117 115 87 96 95 103 102 (NA)
37951 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199
37932 980 968 940 1084 861 724 714 701 589 641 (NA)
for col in df.filter(like='20').columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
Method 4
df[cols] = pd.to_numeric(df[cols].stack(), errors='coerce').unstack()
Method 5
If you are looking for a range of columns, you can try this:
df.iloc[7:] = df.iloc[7:].astype(float)
The examples above will convert type to be float, for all the columns begin with the 7th to the end. You of course can use different type or different range.
I think this is useful when you have a big range of columns to convert and a lot of rows. It doesn’t make you go over each row by yourself – I believe numpy do it more efficiently.
This is useful only if you know that all the required columns contain numbers only – it will not change “bad values” (like string) to be NaN for you.
Method 6
df.loc[:,'col':] = df.loc[:,'col':].apply(pd.to_numeric, errors = 'coerce')
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0