I would like to see if a particular string exists in a particular column within my dataframe.
I’m getting the error
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
import pandas as pd
BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]
a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])
if a['Names'].str.contains('Mel'):
print ("Mel is there")
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)
Therefore, you can use
mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
print ("There are {m} Mels".format(m=mel_count))
Or any(), if you don’t care how many records match your query
if a['Names'].str.contains('Mel').any():
print ("Mel is there")
Method 2
You should use any()
In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True
In [99]: if a['Names'].str.contains('Mel').any():
....: print "Mel is there"
....:
Mel is there
a['Names'].str.contains('Mel') gives you a series of bool values
In [100]: a['Names'].str.contains('Mel')
Out[100]:
0 False
1 False
2 False
3 False
4 True
Name: Names, dtype: bool
Method 3
OP meant to find out whether the string ‘Mel’ exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.
A simple equals-to is enough:
df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})
mel_count = (df['names'] == 'Mel').sum()
print("There are {num} instances of 'Mel'. ".format(num=mel_count))
mel_exists = (df['names'] == 'Mel').any()
print("'Mel' exists in the dataframe.".format(num=mel_exists))
mel_exists2 = 'Mel' in df['names'].values
print("'Mel' is in the dataframe: " + str(mel_exists2))
Prints:
There are 2 instances of 'Mel'. 'Mel' exists in the dataframe. 'Mel' is in the dataframe: True
Method 4
I bumped into the same problem, I used:
if "Mel" in a["Names"].values:
print("Yep")
But this solution may be slower since internally pandas create a list from a Series.
Method 5
If there is any chance that you will need to search for empty strings,
a['Names'].str.contains('')
will NOT work, as it will always return True.
Instead, use
if '' in a["Names"].values
to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.
Method 6
Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy
So, an alternative that would work int this case is:
b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
print("Mel is in the dataframe column Names")
Method 7
For case-insensitive search.
a['Names'].str.lower().str.contains('mel').any()
Method 8
import re s = 'string' df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE) #or df['Name'] = df[df['Name'].isin(['string1', 'string2'])]
Method 9
import pandas as pd (data_frame.col_name=='str_name_to_check').sum()
Method 10
If you want to save the results then you can use this:
a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))
Method 11
You should check the value of your line of code like adding checking length of it.
if(len(a['Names'].str.contains('Mel'))>0):
print("Name Present")
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0