I have a pandas Series object containing boolean values. How can I get a series containing the logical NOT of each value?
For example, consider a series containing:
True True True False
The series I’d like to get would contain:
False False False True
This seems like it should be reasonably simple, but apparently I’ve misplaced my mojo =(
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
To invert a boolean Series, use ~s:
In [7]: s = pd.Series([True, True, False, True]) In [8]: ~s Out[8]: 0 False 1 False 2 True 3 False dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000) In [10]: %timeit np.invert(s) 10000 loops, best of 3: 91.8 µs per loop In [11]: %timeit ~s 10000 loops, best of 3: 73.5 µs per loop In [12]: %timeit (-s) 10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s) is no longer as fast as ~s or -s.
Caveat: timeit results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.
Method 2
@unutbu’s answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not ‘object’. Ie your mask can’t have ever had any nan’s. See here – even if your mask is nan-free now, it will remain ‘object’ type.
The inverse of an ‘object’ series won’t throw an error, instead you’ll get a garbage mask of ints that won’t work as you expect.
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:
In [1]: ~True Out[1]: -2
As @geher says, you can convert it to bool with astype before you inverse with ~
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool
Method 3
I just give it a shot:
In [9]: s = Series([True, True, True, False]) In [10]: s Out[10]: 0 True 1 True 2 True 3 False In [11]: -s Out[11]: 0 False 1 False 2 False 3 True
Method 4
You can also use numpy.invert:
In [1]: import numpy as np In [2]: import pandas as pd In [3]: s = pd.Series([True, True, False, True]) In [4]: np.invert(s) Out[4]: 0 False 1 False 2 True 3 False
EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 – doesn’t seem to exist using NumPy 1.6.2 though:
In [5]: %timeit (-s) 10000 loops, best of 3: 26.8 us per loop In [6]: %timeit np.invert(s) 100000 loops, best of 3: 7.85 us per loop In [7]: %timeit ~s 10000 loops, best of 3: 27.3 us per loop
Method 5
In support to the excellent answers here, and for future convenience, there may be a case where you want to flip the truth values in the columns and have other values remain the same (nan values for instance)
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series[series.notna()] #remove nan values
In[3]: series # without nan
Out[3]:
0 True
2 False
dtype: object
# Out[4] expected to be inverse of Out[3], pandas applies bitwise complement
# operator instead as in `lambda x : (-1*x)-1`
In[4]: ~series
Out[4]:
0 -2
2 -1
dtype: object
as a simple non-vectorized solution you can just, 1. check types2. inverse bools
In[1]: series = pd.Series([True, np.nan, False, np.nan]) In[2]: series = series.apply(lambda x : not x if x is bool else x) Out[2]: Out[2]: 0 True 1 NaN 2 False 3 NaN dtype: object
Method 6
NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
gives you
0 False
1 True
2 True
3 False
dtype: object
whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.
Pandas 0.25, NumPy 1.17
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0