I am trying to access the index of a row in a function applied across an entire DataFrame in Pandas. I have something like this:
df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c']) >>> df a b c 0 1 2 3 1 4 5 6
and I’ll define a function that access elements with a given row
def rowFunc(row):
return row['a'] + row['b'] * row['c']
I can apply it like so:
df['d'] = df.apply(rowFunc, axis=1) >>> df a b c d 0 1 2 3 7 1 4 5 6 34
Awesome! Now what if I want to incorporate the index into my function?
The index of any given row in this DataFrame before adding d would be Index([u'a', u'b', u'c', u'd'], dtype='object'), but I want the 0 and 1. So I can’t just access row.index.
I know I could create a temporary column in the table where I store the index, but I’m wondering if it is stored in the row object somewhere.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
To access the index in this case you access the name attribute:
In [182]:
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
def rowFunc(row):
return row['a'] + row['b'] * row['c']
def rowIndex(row):
return row.name
df['d'] = df.apply(rowFunc, axis=1)
df['rowIndex'] = df.apply(rowIndex, axis=1)
df
Out[182]:
a b c d rowIndex
0 1 2 3 7 0
1 4 5 6 34 1
Note that if this is really what you are trying to do that the following works and is much faster:
In [198]: df['d'] = df['a'] + df['b'] * df['c'] df Out[198]: a b c d 0 1 2 3 7 1 4 5 6 34 In [199]: %timeit df['a'] + df['b'] * df['c'] %timeit df.apply(rowIndex, axis=1) 10000 loops, best of 3: 163 µs per loop 1000 loops, best of 3: 286 µs per loop
EDIT
Looking at this question 3+ years later, you could just do:
In[15]: df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index df Out[15]: a b c d rowIndex 0 1 2 3 7 0 1 4 5 6 34 1
but assuming it isn’t as trivial as this, whatever your rowFunc is really doing, you should look to use the vectorised functions, and then use them against the df index:
In[16]: df['newCol'] = df['a'] + df['b'] + df['c'] + df.index df Out[16]: a b c d rowIndex newCol 0 1 2 3 7 0 6 1 4 5 6 34 1 16
Method 2
Either:
#1. with row.name inside the apply(..., axis=1) call:
df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y']) a b c x 1 2 3 y 4 5 6 df.apply(lambda row: row.name, axis=1) x x y y
#2. with iterrows() (slower)
DataFrame.iterrows() allows you to iterate over rows, and access their index:
for idx, row in df.iterrows():
...
Method 3
To answer the original question: yes, you can access the index value of a row in apply(). It is available under the key name and requires that you specify axis=1 (because the lambda processes the columns of a row and not the rows of a column).
Working example (pandas 0.23.4):
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
b c
a
1 2 3
4 5 6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
b c index_x10
a
1 2 3 10
4 5 6 40
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0