I am parsing data from an Excel file that has extra white space in some of the column headings.
When I check the columns of the resulting dataframe, with df.columns, I see:
Index(['Year', 'Month ', 'Value'])
^
# Note the unwanted trailing space on 'Month '
Consequently, I can’t do:
df["Month"]
Because it will tell me the column is not found, as I asked for “Month”, not “Month “.
My question, then, is how can I strip out the unwanted white space from the column headings?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can give functions to the rename method. The str.strip() method should do what you want:
In [5]: df Out[5]: Year Month Value 0 1 2 3 [1 rows x 3 columns] In [6]: df.rename(columns=lambda x: x.strip()) Out[6]: Year Month Value 0 1 2 3 [1 rows x 3 columns]
Note: that this returns a DataFrame object and it’s shown as output on screen, but the changes are not actually set on your columns. To make the changes, either use this in a method chain or re-assign the df variabe:
df = df.rename(columns=lambda x: x.strip())
Method 2
Since version 0.16.1 you can just call .str.strip on the columns:
df.columns = df.columns.str.strip()
Here is a small example:
In [5]: df = pd.DataFrame(columns=['Year', 'Month ', 'Value']) print(df.columns.tolist()) df.columns = df.columns.str.strip() df.columns.tolist() ['Year', 'Month ', 'Value'] Out[5]: ['Year', 'Month', 'Value']
Timings
In[26]: df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', ' sa', ' asdas ']) df Out[26]: Empty DataFrame Columns: [ year, month , day, asdas , asdas, as , sa, asdas ] %timeit df.rename(columns=lambda x: x.strip()) %timeit df.columns.str.strip() 1000 loops, best of 3: 293 µs per loop 10000 loops, best of 3: 143 µs per loop
So str.strip is ~2X faster, I expect this to scale better for larger dfs
Method 3
If you use CSV format to export from Excel and read as Pandas DataFrame, you can specify:
skipinitialspace=True
when calling pd.read_csv.
From the documentation:
skipinitialspace : bool, default False
Skip spaces after delimiter.
Method 4
Actually can do that with
df.rename(str.strip, axis = 'columns')
Which is shown in Pandas documentation
here.
Method 5
If you are looking for an unbreakable way to do it, I would suggest:
data_frame.rename(columns=lambda x: x.strip() if isinstance(x, str) else x, inplace=True)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0