I have a data file from columns A-G like below but when I am reading it with pd.read_csv('data.csv') it prints an extra unnamed column at the end for no reason.
colA ColB colC colD colE colF colG Unnamed: 7 44 45 26 26 40 26 46 NaN 47 16 38 47 48 22 37 NaN 19 28 36 18 40 18 46 NaN 50 14 12 33 12 44 23 NaN 39 47 16 42 33 48 38 NaN
I have seen my data file various times but I have no extra data in any other column. How I should remove this extra column while reading ? Thanks
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
In [162]: df
Out[162]:
colA ColB colC colD colE colF colG
0 44 45 26 26 40 26 46
1 47 16 38 47 48 22 37
2 19 28 36 18 40 18 46
3 50 14 12 33 12 44 23
4 39 47 16 42 33 48 38
if the first column in the CSV file has index values, then you can do this instead:
df = pd.read_csv('data.csv', index_col=0)
Method 2
First, find the columns that have ‘unnamed’, then drop those columns. Note: You should Add inplace = True to the .drop parameters as well.
df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
Method 3
The pandas.DataFrame.dropna function removes missing values (e.g. NaN, NaT).
For example the following code would remove any columns from your dataframe, where all of the elements of that column are missing.
df.dropna(how='all', axis='columns')
Method 4
The approved solution doesn’t work in my case, so my solution is the following one:
''' The column name in the example case is "Unnamed: 7"
but it works with any other name ("Unnamed: 0" for example). '''
df.rename({"Unnamed: 7":"a"}, axis="columns", inplace=True)
# Then, drop the column as usual.
df.drop(["a"], axis=1, inplace=True)
Hope it helps others.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0