How do I check a Pandas column for “any” row that matches a condition? (in my case, I want to test for type string).
Background: I was using the df.columnName.dtype.kind == ‘O’ to check for strings. But then I encountered the issue where some of my columns had decimal values. So I am looking for a different way to check and what I have come up with is:
display(df.col1.apply(lambda x: isinstance(x,str)).any()) #true
But the above code causes isinstance to be evaluated on every row and that seems inefficient, if I have a very large number of rows. How can I implement the above check, such that it stops evaluating further after encountering the first true value.
here is a more complete example:
from decimal import *
import pandas as pd
data = {
'c1': [None,'a','b'],
'c2': [None,1,2],
'c3': [None,Decimal(1),Decimal(2)]
}
dx = pd.DataFrame(data)
print(dx) #displays the dataframe
print('dx.dtypes')
print(dx.dtypes) #displays the datatypes in the dataframe
print('dx.c1.dtype:',dx.c1.dtype) #'O'
print('dx.c2.dtype:',dx.c2.dtype) #'float64'
print('dx.c3.dtype:',dx.c3.dtype) #'O'!
print('dx.c1.apply(lambda x: isinstance(x,str)')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
print('dx.c2.apply(lambda x: isinstance(x,str)).any()')
print(dx.c2.apply(lambda x: isinstance(x,str)).any())#false
#the following line shows that the apply function applies it to every row
print('dx.c1.apply(lambda x: isinstance(x,str))')
print(dx.c1.apply(lambda x: isinstance(x,str))) #false,false,false
#and only after that is the any function applied
print('dx.c1.apply(lambda x: isinstance(x,str)).any()')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
The above code outputs:
c1 c2 c3 0 None NaN None 1 a 1.0 1 2 b 2.0 2 dx.dtypes c1 object c2 float64 c3 object dtype: object dx.c1.dtype: object dx.c2.dtype: float64 dx.c3.dtype: object dx.c1.apply(lambda x: isinstance(x,str) True dx.c2.apply(lambda x: isinstance(x,str)).any() False dx.c1.apply(lambda x: isinstance(x,str)) 0 False 1 True 2 True Name: c1, dtype: bool dx.c1.apply(lambda x: isinstance(x,str)).any() True
Is there a better way?
More detail: I am trying to fix this line, which breaks when the column has “decimal” values: https://github.com/capitalone/datacompy/blob/8a74e60d26990e3e05d5b15eb6fb82fef62f4776/datacompy/core.py#L273
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Copying my comment as an answer:
It seems what you needed was the built-in function any:
any(isinstance(x,str) for x in df['col1'])
That way rows are only evaluated until an instance of string is found.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0