Import pandas dataframe column as string not int

I would like to import the following csv as strings not as int64. Pandas read_csv automatically converts it to int64, but I need this column as string.

ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166


df = read_csv('sample.csv')

df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Unfortunately using converters gives the same result.

df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]: 
                           ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

I’m creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

Method 2

This probably isn’t the most elegant way to do it, but it gets the job done.

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])

In[4]: df
Out[4]: 
                       ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file

Method 3

Since pandas 1.0 it became much more straightforward. This will read column ‘ID’ as dtype ‘string’:

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, ‘string’ dtype has been introduced (before strings were treated as dtype ‘object’).


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x