this is a rather similar question to this question but with one key difference: I’m selecting the data I want to change not by its index but by some criteria.
If the criteria I apply return a single row, I’d expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn’t work:
>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009],
... 'flavour':['strawberry','strawberry','banana','banana',
... 'strawberry','strawberry','banana','banana'],
... 'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
... 'sales':[10,12,22,23,11,13,23,24]})
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
>>> d[d.sales==24]
day flavour sales year
7 sun banana 24 2009
>>> d[d.sales==24].sales = 100
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
So rather than setting 2009 Sunday’s Banana sales to 100, nothing happens! What’s the nicest way to do this? Ideally the solution should use the row number, as you normally don’t know that in advance!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Many ways to do that
1
In [7]: d.sales[d.sales==24] = 100 In [8]: d Out[8]: day flavour sales year 0 sat strawberry 10 2008 1 sun strawberry 12 2008 2 sat banana 22 2008 3 sun banana 23 2008 4 sat strawberry 11 2009 5 sun strawberry 13 2009 6 sat banana 23 2009 7 sun banana 100 2009
2
In [26]: d.loc[d.sales == 12, 'sales'] = 99 In [27]: d Out[27]: day flavour sales year 0 sat strawberry 10 2008 1 sun strawberry 99 2008 2 sat banana 22 2008 3 sun banana 23 2008 4 sat strawberry 11 2009 5 sun strawberry 13 2009 6 sat banana 23 2009 7 sun banana 100 2009
3
In [28]: d.sales = d.sales.replace(23, 24) In [29]: d Out[29]: day flavour sales year 0 sat strawberry 10 2008 1 sun strawberry 99 2008 2 sat banana 22 2008 3 sun banana 24 2008 4 sat strawberry 11 2009 5 sun strawberry 13 2009 6 sat banana 24 2009 7 sun banana 100 2009
Method 2
Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.
Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.
d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100
Method 3
Old question, but I’m surprised nobody mentioned numpy’s .where() functionality (which can be called directly from the pandas module).
In this case the code would be:
d.sales = pd.np.where(d.sales == 24, 100, d.sales)
To my knowledge, this is one of the fastest ways to conditionally change data across a series.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0