Replace all occurrences of a string in a pandas dataframe (Python)

I have a pandas dataframe with about 20 columns.

It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:

df['columnname1'] = df['columnname1'].str.replace("n","<br>")
df['columnname2'] = df['columnname2'].str.replace("n","<br>")
df['columnname3'] = df['columnname3'].str.replace("n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("n","<br>")

This unfortunately does not work:

df = df.replace("n","<br>")

Is there any other, more elegant solution?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use replace and pass the strings to find/replace as dictionary keys/items:

df.replace({'n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
   a    b
0  1n  4n
1  2n  5
2  3    6n

>>> df.replace({'n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Note that this method returns a new DataFrame instance by default (it does not modify the original), so you’ll need to either reassign the output:

df = df.replace({'n': '<br>'}, regex=True)

or specify inplace=True:

df.replace({'n': '<br>'}, regex=True, inplace=True)

Method 2

It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:

df.replace({'n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
   a    b
0  1n  4n
1  2n  5
2  3    6n

>>> df.replace({'n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Method 3

You can iterate over all columns and use the method str.replace:

for col in df.columns:
   df[col] = df[col].str.replace('n', '<br>')

This method uses regex by default.

Method 4

This will remove all newlines and unecessary spaces. You can edit the ‘ ‘.join to specify a replacement character

    df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating