Say I have the following dataframe:
What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?
Is this it?
for index, row in df.iterrows():
if df1.loc[index,'stream'] == 2:
# do something
UPDATE:
What to do if I have more than a 100 columns? I don’t want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).
So to be clear what my goal is:
Dividing all values by 2 of all rows that have stream 2, but not changing the stream column
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I think you can use loc if you need update two columns to same value:
df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa' print df1 stream feat another_feat a 1 some_value some_value b 2 aaaa aaaa c 2 aaaa aaaa d 3 some_value some_value
If you need update separate, one option is use:
df1.loc[df1['stream'] == 2, 'feat'] = 10 print df1 stream feat another_feat a 1 some_value some_value b 2 10 some_value c 2 10 some_value d 3 some_value some_value
Another common option is use numpy.where:
df1['feat'] = np.where(df1['stream'] == 2, 10,20) print df1 stream feat another_feat a 1 20 some_value b 2 10 some_value c 2 10 some_value d 3 20 some_value
EDIT: If you need divide all columns without stream where condition is True, use:
print df1 stream feat another_feat a 1 4 5 b 2 4 5 c 2 2 9 d 3 1 7 #filter columns all without stream cols = [col for col in df1.columns if col != 'stream'] print cols ['feat', 'another_feat'] df1.loc[df1['stream'] == 2, cols ] = df1 / 2 print df1 stream feat another_feat a 1 4.0 5.0 b 2 2.0 2.5 c 2 1.0 4.5 d 3 1.0 7.0
If working with multiple conditions is possible use multiple numpy.where
or numpy.select:
df0 = pd.DataFrame({'Col':[5,0,-6]})
df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing',
np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))
df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
['Increasing', 'Decreasing'],
default='No Change')
print (df0)
Col New Col1 New Col2
0 5 Increasing Increasing
1 0 No Change No Change
2 -6 Decreasing Decreasing
Method 2
You can do the same with .ix, like this:
In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))
In [2]: df
Out[2]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 -0.905302 -0.435821 1.934512
3 0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
In [3]: df.ix[df.a>0, ['b','c']] = 0
In [4]: df
Out[4]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 0.000000 0.000000 1.934512
3 0.266113 0.000000 0.000000 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
EDIT
After the extra information, the following will return all columns – where some condition is met – with halved values:
>> condition = df.a > 0 >> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
