I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly).
I already figured out how to kind-of do this and I am currently using:
df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']
This seems one hell of an inelegant thing to do – do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?
In case this is yet unclear, I would like to turn:
col 1 a 2 0
into:
col 1 stra 2 str0
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
df['col'] = 'str' + df['col'].astype(str)
Example:
>>> df = pd.DataFrame({'col':['a',0]})
>>> df
col
0 a
1 0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
col
0 stra
1 str0
Method 2
As an alternative, you can also use an apply combined with format (or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:
df = pd.DataFrame({'col':['a', 0]})
df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))
which also yields the desired output:
col 0 stra 1 str0
If you are using Python 3.6+, you can also use f-strings:
df['col'] = df['col'].apply(lambda x: f"str{x}")
yielding the same output.
The f-string version is almost as fast as @RomanPekar’s solution (python 3.6.4):
df = pd.DataFrame({'col':['a', 0]*200000})
%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using format, however, is indeed far slower:
%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 3
You can use pandas.Series.map :
df['col'].map('str{}'.format)
In this example, it will apply the word str before all your values.
Method 4
If you load you table file with dtype=str
or convert column type to string df['a'] = df['a'].astype(str)
then you can use such approach:
df['a']= 'col' + df['a'].str[:]
This approach allows prepend, append, and subset string of df.
Works on Pandas v0.23.4, v0.24.1. Don’t know about earlier versions.
Method 5
Another solution with .loc:
df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)
This is not as quick as solutions above (>1ms per loop slower) but may be useful in case you need conditional change, like:
mask = (df['col'] == 0) df.loc[mask, 'col'] = 'string' + df['col'].astype(str)
Method 6
Contributing to prefixing columns while controlling NaNs for things like human readable values on csv export.
"_" + df['col1'].replace(np.nan,'').astype(str)
Example:
import sys
import platform
import pandas as pd
import numpy as np
print("python {}".format(platform.python_version(), sys.executable))
print("pandas {}".format(pd.__version__))
print("numpy {}".format(np.__version__))
df = pd.DataFrame({
'col1':["1a","1b","1c",np.nan],
'col2':["2a","2b",np.nan,"2d"],
'col3':[31,32,33,34],
'col4':[np.nan,42,43,np.nan]})
df['col1_prefixed'] = "_" + df['col1'].replace(np.nan,'no value').astype(str)
df['col4_prefixed'] = "_" + df['col4'].replace(np.nan,'no value').astype(str)
print(df)
python 3.7.3 pandas 1.2.3 numpy 1.18.5 col1 col2 col3 col4 col1_prefixed col4_prefixed 0 1a 2a 31 NaN _1a _no value 1 1b 2b 32 42.0 _1b _42.0 2 1c NaN 33 43.0 _1c _43.0 3 NaN 2d 34 NaN _no value _no value
(Sorry for the verbosity, I found this Q while working on an unrelated column type issue and this is my reproduction code)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0