How to drop rows from pandas data frame that contains a particular string in a particular column?

I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

For example, I want to drop all rows which have the string “XYZ” as a substring in the column C of the data frame.

Can this be implemented in an efficient way using .drop() method?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don’t want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat

Method 2

If your string constraint is not just one string you can drop those corresponding rows with:

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list

Method 3

This will only work if you want to compare exact strings.
It will not work in case you want to check if the column string contains any of the strings in the list.

The right way to compare with a list would be :

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]

Method 4

Slight modification to the code. Having na=False will skip empty values. Otherwise you can get an error TypeError: bad operand type for unary ~: float

df[~df.C.str.contains("XYZ", na=False)]

Source: TypeError: bad operand type for unary ~: float

Method 5

new_df = df[df.C != 'XYZ']

Reference: https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/

Method 6

The below code will give you list of all the rows:-

df[df['C'] != 'XYZ']

To store the values from the above code into a dataframe :-

newdf = df[df['C'] != 'XYZ']

Method 7

if you do not want to delete all NaN, use

df[~df.C.str.contains("XYZ") == True]


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x