Check if a string in a Pandas DataFrame column is in a list of strings

If I have a frame like this

frame = pd.DataFrame({
    "a": ["the cat is blue", "the sky is green", "the dog is black"]
})

and I want to check if any of those rows contain a certain word I just have to do this.

frame["b"] = (
   frame.a.str.contains("dog") |
   frame.a.str.contains("cat") |
   frame.a.str.contains("fish")
)

frame["b"] outputs:

0     True
1    False
2     True
Name: b, dtype: bool

If I decide to make a list:

mylist = ["dog", "cat", "fish"]

How would I check that the rows contain a certain word in the list?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
                  a
0   the cat is blue
1  the sky is green
2  the dog is black

The str.contains method accepts a regular expression pattern:

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

pattern
'dog|cat|fish'

frame.a.str.contains(pattern)
0     True
1    False
2     True
Name: a, dtype: bool

Because regex patterns are supported, you can also embed flags:

frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})

frame
                     a
0  Cat Mr. Nibbles is blue
1         the sky is green
2         the dog is black

pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+

pattern
'(?i)dog|(?i)cat|(?i)fish'
 
frame.a.str.contains(pattern)
0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1    False
2     True

Method 2

For list should work

print(frame.isin(mylist)])

See pandas.DataFrame.isin().

Method 3

After going through the comments of the accepted answer of extracting the string, this approach can also be tried.

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
              a
0   the cat is blue
1  the sky is green
2  the dog is black

Let us create our list which will have strings that needs to be matched and extracted.

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

Now let create a function which will be responsible to find and extract the substring.

import re
def pattern_searcher(search_str:str, search_list:str):

    search_obj = re.search(search_list, search_str)
    if search_obj :
        return_str = search_str[search_obj.start(): search_obj.end()]
    else:
        return_str = 'NA'
    return return_str

We will use this function with pandas.DataFrame.apply

frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))

Result :

              a             matched_str
   0   the cat is blue         cat
   1  the sky is green         NA
   2  the dog is black         dog

Method 4

We can check for three patterns simultaneously using pipe, for example

for i in range(len(df)):
       if re.findall(r'car|oxide|gen', df.iat[i,1]):
           df.iat[i,2]='Yes'
       else:
           df.iat[i,2]='No'


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x