I have a pandas data frame where the first 3 columns are strings:
ID text1 text 2 0 2345656 blah blah 1 3456 blah blah 2 541304 blah blah 3 201306 hi blah 4 12313201308 hello blah
I want to add leading zeros to the ID:
ID text1 text 2 0 000000002345656 blah blah 1 000000000003456 blah blah 2 000000000541304 blah blah 3 000000000201306 hi blah 4 000012313201308 hello blah
I have tried:
df['ID'] = df.ID.zfill(15)
df['ID'] = '{0:0>15}'.format(df['ID'])
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Try:
df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))
or even
df['ID'] = df['ID'].apply(lambda x: x.zfill(15))
Method 2
str attribute contains most of the methods in string.
df['ID'] = df['ID'].str.zfill(15)
See more: http://pandas.pydata.org/pandas-docs/stable/text.html
Method 3
It can be achieved with a single line while initialization. Just use converters argument.
df = pd.read_excel('filename.xlsx', converters={'ID': '{:0>15}'.format})
so you’ll reduce the code length by half 🙂
PS: read_csv have this argument as well.
Method 4
With Python 3.6+, you can also use f-strings:
df['ID'] = df['ID'].map(lambda x: f'{x:0>15}')
Performance is comparable or slightly worse versus df['ID'].map('{:0>15}'.format). On the other hand, f-strings permit more complex output, and you can use them more efficiently via a list comprehension.
Performance benchmarking
# Python 3.6.0, Pandas 0.19.2
df = pd.concat([df]*1000)
%timeit df['ID'].map('{:0>15}'.format) # 4.06 ms per loop
%timeit df['ID'].map(lambda x: f'{x:0>15}') # 5.46 ms per loop
%timeit df['ID'].astype(str).str.zfill(15) # 18.6 ms per loop
%timeit list(map('{:0>15}'.format, df['ID'].values)) # 7.91 ms per loop
%timeit ['{:0>15}'.format(x) for x in df['ID'].values] # 7.63 ms per loop
%timeit [f'{x:0>15}' for x in df['ID'].values] # 4.87 ms per loop
%timeit [str(x).zfill(15) for x in df['ID'].values] # 21.2 ms per loop
# check results are the same
x = df['ID'].map('{:0>15}'.format)
y = df['ID'].map(lambda x: f'{x:0>15}')
z = df['ID'].astype(str).str.zfill(15)
assert (x == y).all() and (x == z).all()
Method 5
If you are encountering the error:
Pandas error: Can only use .str accessor with string values, which use np.object_ dtype in pandas
df['ID'] = df['ID'].astype(str).str.zfill(15)
Method 6
If you want a more customizable solution to this problem, you can try pandas.Series.str.pad
df['ID'] = df['ID'].astype(str).str.pad(15, side='left', fillchar='0')
str.zfill(n) is a special case equivalent to str.pad(n, side='left', fillchar='0')
Method 7
rjust worked for me:
df['ID']= df['ID'].str.rjust(15,'0')
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0