I have a pandas dataframe with the following column names:
Result1, Test1, Result2, Test2, Result3, Test3, etc…
I want to drop all the columns whose name contains the word “Test”. The numbers of such columns is not static but depends on a previous function.
How can I do that?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here is one way to do this:
df = df[df.columns.drop(list(df.filter(regex='Test')))]
Method 2
import pandas as pd
import numpy as np
array=np.random.random((2,4))
df=pd.DataFrame(array, columns=('Test1', 'toto', 'test2', 'riri'))
print df
Test1 toto test2 riri
0 0.923249 0.572528 0.845464 0.144891
1 0.020438 0.332540 0.144455 0.741412
cols = [c for c in df.columns if c.lower()[:4] != 'test']
df=df[cols]
print df
toto riri
0 0.572528 0.144891
1 0.332540 0.741412
Method 3
Cheaper, Faster, and Idiomatic: str.contains
In recent versions of pandas, you can use string methods on the index and columns. Here, str.startswith seems like a good fit.
To remove all columns starting with a given substring:
df.columns.str.startswith('Test')
# array([ True, False, False, False])
df.loc[:,~df.columns.str.startswith('Test')]
toto test2 riri
0 x x x
1 x x x
For case-insensitive matching, you can use regex-based matching with str.contains with an SOL anchor:
df.columns.str.contains('^test', case=False)
# array([ True, False, True, False])
df.loc[:,~df.columns.str.contains('^test', case=False)]
toto riri
0 x x
1 x x
if mixed-types is a possibility, specify na=False as well.
Method 4
This can be done neatly in one line with:
df = df.drop(df.filter(regex='Test').columns, axis=1)
Method 5
You can filter out the columns you DO want using ‘filter’
import pandas as pd
import numpy as np
data2 = [{'test2': 1, 'result1': 2}, {'test': 5, 'result34': 10, 'c': 20}]
df = pd.DataFrame(data2)
df
c result1 result34 test test2
0 NaN 2.0 NaN NaN 1.0
1 20.0 NaN 10.0 5.0 NaN
Now filter
df.filter(like='result',axis=1)
Get..
result1 result34 0 2.0 NaN 1 NaN 10.0
Method 6
Use the DataFrame.select method:
In [38]: df = DataFrame({'Test1': randn(10), 'Test2': randn(10), 'awesome': randn(10)})
In [39]: df.select(lambda x: not re.search('Testd+', x), axis=1)
Out[39]:
awesome
0 1.215
1 1.247
2 0.142
3 0.169
4 0.137
5 -0.971
6 0.736
7 0.214
8 0.111
9 -0.214
Method 7
Using a regex to match all columns not containing the unwanted word:
df = df.filter(regex='^((?!badword).)*$')
Method 8
This method does everything in place. Many of the other answers create copies and are not as efficient:
df.drop(df.columns[df.columns.str.contains('Test')], axis=1, inplace=True)
Method 9
Question states ‘I want to drop all the columns whose name contains the word “Test”.’
test_columns = [col for col in df if 'Test' in col]
df.drop(columns=test_columns, inplace=True)
Method 10
the shortest way to do is is :
resdf = df.filter(like='Test',axis=1)
Method 11
Solution when dropping a list of column names containing regex. I prefer this approach because I’m frequently editing the drop list. Uses a negative filter regex for the drop list.
drop_column_names = ['A','B.+','C.*']
drop_columns_regex = '^(?!(?:'+'|'.join(drop_column_names)+')$)'
print('Dropping columns:',', '.join([c for c in df.columns if re.search(drop_columns_regex,c)]))
df = df.filter(regex=drop_columns_regex,axis=1)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0