this question was previously asked (and then deleted) by an user, I was looking to find a solution so I could give out an answer when the question disappeared and I, moreover, can’t seem to make sense of pandas’ behaviour so I would appreciate some clarity, the original question stated something along the lines of:
How can I replace every negative value except those in a given list with NaN in a Pandas dataframe?
my setup to reproduce the scenario is the following:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A' : [x for x in range(4)],
'B' : [x for x in range(-2, 2)]
})
this should technically only be an issue of correctly passing a boolean expression to pd.where, my attemped solution looks like:
df[df >= 0 | df.isin([-2])]
which produces:
| index | A | B |
|---|---|---|
| 0 | 0 | NaN |
| 1 | 1 | NaN |
| 2 | 2 | 0 |
| 3 | 3 | 1 |
which also cancels the number in the list!
moreover if I mask the dataframe with each of the two conditions I get the correct behavior:
with df[df >= 0] (identical to the compound result)
| index | A | B |
|---|---|---|
| 0 | 0 | NaN |
| 1 | 1 | NaN |
| 2 | 2 | 0 |
| 3 | 3 | 1 |
with df[df.isin([-2])] (identical to the compound result)
| index | A | B |
|---|---|---|
| 0 | NaN | -2.0 |
| 1 | NaN | NaN |
| 2 | NaN | NaN |
| 3 | NaN | NaN |
So it seems like I am
- Running into some undefined behaviour as a result of performing logic on NaN values
- I have got something wrong
Anyone can clarify this situation to me?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Solution
df[(df >= 0) | (df.isin([-2]))]
Explanation
In python, bitwise OR, |, has a higher operator precedence than comparison operators like >=: https://docs.python.org/3/reference/expressions.html#operator-precedence
When filtering a pandas DataFrame on multiple boolean conditions, you need to enclose each condition in parentheses. More from the boolean indexing section of the pandas user guide:
Another common operation is the use of boolean vectors to filter the
data. The operators are:|foror,&forand, and~fornot. These
must be grouped by using parentheses, since by default Python will
evaluate an expression such asdf['A'] > 2 & df['B'] < 3asdf['A'] > (2 & df['B']) < 3, while the desired evaluation order is(df['A'] > 2) & (df['B'] < 3).
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0