Pandas implement an any check

How do I check a Pandas column for “any” row that matches a condition? (in my case, I want to test for type string).

Background: I was using the df.columnName.dtype.kind == ‘O’ to check for strings. But then I encountered the issue where some of my columns had decimal values. So I am looking for a different way to check and what I have come up with is:

display(df.col1.apply(lambda x: isinstance(x,str)).any()) #true

But the above code causes isinstance to be evaluated on every row and that seems inefficient, if I have a very large number of rows. How can I implement the above check, such that it stops evaluating further after encountering the first true value.

here is a more complete example:

from decimal import *
import pandas as pd

data = {
        'c1':  [None,'a','b'],
        'c2': [None,1,2],
        'c3': [None,Decimal(1),Decimal(2)]
       }

dx = pd.DataFrame(data)
print(dx) #displays the dataframe
print('dx.dtypes')
print(dx.dtypes) #displays the datatypes in the dataframe

print('dx.c1.dtype:',dx.c1.dtype) #'O'
print('dx.c2.dtype:',dx.c2.dtype) #'float64'
print('dx.c3.dtype:',dx.c3.dtype) #'O'!

print('dx.c1.apply(lambda x: isinstance(x,str)')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
print('dx.c2.apply(lambda x: isinstance(x,str)).any()')
print(dx.c2.apply(lambda x: isinstance(x,str)).any())#false

#the following line shows that the apply function applies it to every row
print('dx.c1.apply(lambda x: isinstance(x,str))')
print(dx.c1.apply(lambda x: isinstance(x,str))) #false,false,false

#and only after that is the any function applied
print('dx.c1.apply(lambda x: isinstance(x,str)).any()')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true

The above code outputs:

     c1   c2    c3
0  None  NaN  None
1     a  1.0     1
2     b  2.0     2

dx.dtypes
c1     object
c2    float64
c3     object
dtype: object

dx.c1.dtype: object
dx.c2.dtype: float64
dx.c3.dtype: object

dx.c1.apply(lambda x: isinstance(x,str)
True

dx.c2.apply(lambda x: isinstance(x,str)).any()
False

dx.c1.apply(lambda x: isinstance(x,str))
0    False
1     True
2     True
Name: c1, dtype: bool

dx.c1.apply(lambda x: isinstance(x,str)).any()
True

Is there a better way?

More detail: I am trying to fix this line, which breaks when the column has “decimal” values: https://github.com/capitalone/datacompy/blob/8a74e60d26990e3e05d5b15eb6fb82fef62f4776/datacompy/core.py#L273

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Copying my comment as an answer:

It seems what you needed was the built-in function any:

any(isinstance(x,str) for x in df['col1'])

That way rows are only evaluated until an instance of string is found.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x