If I have a frame like this
frame = pd.DataFrame({
"a": ["the cat is blue", "the sky is green", "the dog is black"]
})
and I want to check if any of those rows contain a certain word I just have to do this.
frame["b"] = (
frame.a.str.contains("dog") |
frame.a.str.contains("cat") |
frame.a.str.contains("fish")
)
frame["b"] outputs:
0 True
1 False
2 True
Name: b, dtype: bool
If I decide to make a list:
mylist = ["dog", "cat", "fish"]
How would I check that the rows contain a certain word in the list?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
frame
a
0 the cat is blue
1 the sky is green
2 the dog is black
The str.contains method accepts a regular expression pattern:
mylist = ['dog', 'cat', 'fish'] pattern = '|'.join(mylist) pattern 'dog|cat|fish' frame.a.str.contains(pattern) 0 True 1 False 2 True Name: a, dtype: bool
Because regex patterns are supported, you can also embed flags:
frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})
frame
a
0 Cat Mr. Nibbles is blue
1 the sky is green
2 the dog is black
pattern = '|'.join([f'(?i){animal}' for animal in mylist]) # python 3.6+
pattern
'(?i)dog|(?i)cat|(?i)fish'
frame.a.str.contains(pattern)
0 True # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1 False
2 True
Method 2
For list should work
print(frame.isin(mylist)])
Method 3
After going through the comments of the accepted answer of extracting the string, this approach can also be tried.
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
frame
a
0 the cat is blue
1 the sky is green
2 the dog is black
Let us create our list which will have strings that needs to be matched and extracted.
mylist = ['dog', 'cat', 'fish'] pattern = '|'.join(mylist)
Now let create a function which will be responsible to find and extract the substring.
import re
def pattern_searcher(search_str:str, search_list:str):
search_obj = re.search(search_list, search_str)
if search_obj :
return_str = search_str[search_obj.start(): search_obj.end()]
else:
return_str = 'NA'
return return_str
We will use this function with pandas.DataFrame.apply
frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))
Result :
a matched_str 0 the cat is blue cat 1 the sky is green NA 2 the dog is black dog
Method 4
We can check for three patterns simultaneously using pipe, for example
for i in range(len(df)):
if re.findall(r'car|oxide|gen', df.iat[i,1]):
df.iat[i,2]='Yes'
else:
df.iat[i,2]='No'
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0