Replace value for a selected cell in pandas DataFrame without using index

this is a rather similar question to this question but with one key difference: I’m selecting the data I want to change not by its index but by some criteria.

If the criteria I apply return a single row, I’d expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn’t work:

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
...                   'flavour':['strawberry','strawberry','banana','banana',
...                   'strawberry','strawberry','banana','banana'],
...                   'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
...                   'sales':[10,12,22,23,11,13,23,24]})

>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

>>> d[d.sales==24]
   day flavour  sales  year
7  sun  banana     24  2009

>>> d[d.sales==24].sales = 100
>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

So rather than setting 2009 Sunday’s Banana sales to 100, nothing happens! What’s the nicest way to do this? Ideally the solution should use the row number, as you normally don’t know that in advance!

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Many ways to do that

1

In [7]: d.sales[d.sales==24] = 100

In [8]: d
Out[8]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

2

In [26]: d.loc[d.sales == 12, 'sales'] = 99

In [27]: d
Out[27]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

3

In [28]: d.sales = d.sales.replace(23, 24)

In [29]: d
Out[29]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     24  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     24  2009
7  sun      banana    100  2009

Method 2

Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.

Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.

d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100

Method 3

Old question, but I’m surprised nobody mentioned numpy’s .where() functionality (which can be called directly from the pandas module).

In this case the code would be:

d.sales = pd.np.where(d.sales == 24, 100, d.sales)

To my knowledge, this is one of the fastest ways to conditionally change data across a series.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x