despite there being at least two good tutorials on how to index a DataFrame in Python’s pandas library, I still can’t work out an elegant way of SELECTing on more than one column.
>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
x y
0 1 4
1 2 5
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2] # This works fine
x y
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have found (what I think is) a rather inelegant way of doing it, like this
>>> d[d['x']>2][d['y']>7]
But it’s not pretty, and it scores fairly low for readability (I think).
Is there a better, more Python-tastic way?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
It is a precedence operator issue.
You should add extra parenthesis to make your multi condition test working:
d[(d['x']>2) & (d['y']>7)]
This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.
Method 2
There may still be a better way, but
In [56]: d[d['x'] > 2] and d[d['y'] > 7] Out[56]: x y 4 5 8
works.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0