Check if string is in a pandas dataframe

I would like to see if a particular string exists in a particular column within my dataframe.

I’m getting the error

ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Method 11

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don’t care how many records match your query

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")

Method 2

You should use any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print "Mel is there"
   ....:
Mel is there

a['Names'].str.contains('Mel') gives you a series of bool values

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool

Method 3

OP meant to find out whether the string ‘Mel’ exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.

A simple equals-to is enough:

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2))

Prints:

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True

Method 4

I bumped into the same problem, I used:

if "Mel" in a["Names"].values:
    print("Yep")

But this solution may be slower since internally pandas create a list from a Series.

Method 5

If there is any chance that you will need to search for empty strings,

    a['Names'].str.contains('')

will NOT work, as it will always return True.

Instead, use

    if '' in a["Names"].values

to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.

Method 6

Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

So, an alternative that would work int this case is:

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")

Method 7

For case-insensitive search.

a['Names'].str.lower().str.contains('mel').any()

Method 8

import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]

Method 9

import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()

Method 10

If you want to save the results then you can use this:

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))

Method 11

You should check the value of your line of code like adding checking length of it.

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating