Add Leading Zeros to Strings in Pandas Dataframe

I have a pandas data frame where the first 3 columns are strings:

         ID        text1    text 2
0       2345656     blah      blah
1          3456     blah      blah
2        541304     blah      blah        
3        201306       hi      blah        
4   12313201308    hello      blah

I want to add leading zeros to the ID:

                ID    text1    text 2
0  000000002345656     blah      blah
1  000000000003456     blah      blah
2  000000000541304     blah      blah        
3  000000000201306       hi      blah        
4  000012313201308    hello      blah

I have tried:

df['ID'] = df.ID.zfill(15)
df['ID'] = '{0:0>15}'.format(df['ID'])

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Performance benchmarking

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Try:

df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))

or even

df['ID'] = df['ID'].apply(lambda x: x.zfill(15))

Method 2

str attribute contains most of the methods in string.

df['ID'] = df['ID'].str.zfill(15)

See more: http://pandas.pydata.org/pandas-docs/stable/text.html

Method 3

It can be achieved with a single line while initialization. Just use converters argument.

df = pd.read_excel('filename.xlsx', converters={'ID': '{:0>15}'.format})

so you’ll reduce the code length by half 🙂

PS: read_csv have this argument as well.

Method 4

With Python 3.6+, you can also use f-strings:

df['ID'] = df['ID'].map(lambda x: f'{x:0>15}')

Performance is comparable or slightly worse versus df['ID'].map('{:0>15}'.format). On the other hand, f-strings permit more complex output, and you can use them more efficiently via a list comprehension.

Performance benchmarking

# Python 3.6.0, Pandas 0.19.2

df = pd.concat([df]*1000)

%timeit df['ID'].map('{:0>15}'.format)                  # 4.06 ms per loop
%timeit df['ID'].map(lambda x: f'{x:0>15}')             # 5.46 ms per loop
%timeit df['ID'].astype(str).str.zfill(15)              # 18.6 ms per loop

%timeit list(map('{:0>15}'.format, df['ID'].values))    # 7.91 ms per loop
%timeit ['{:0>15}'.format(x) for x in df['ID'].values]  # 7.63 ms per loop
%timeit [f'{x:0>15}' for x in df['ID'].values]          # 4.87 ms per loop
%timeit [str(x).zfill(15) for x in df['ID'].values]     # 21.2 ms per loop

# check results are the same
x = df['ID'].map('{:0>15}'.format)
y = df['ID'].map(lambda x: f'{x:0>15}')
z = df['ID'].astype(str).str.zfill(15)

assert (x == y).all() and (x == z).all()

Method 5

If you are encountering the error:

Pandas error: Can only use .str accessor with string values, which use np.object_ dtype in pandas

df['ID'] = df['ID'].astype(str).str.zfill(15)

Method 6

If you want a more customizable solution to this problem, you can try pandas.Series.str.pad

df['ID'] = df['ID'].astype(str).str.pad(15, side='left', fillchar='0')

str.zfill(n) is a special case equivalent to str.pad(n, side='left', fillchar='0')

Method 7

rjust worked for me:

df['ID']= df['ID'].str.rjust(15,'0')

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating