Vectorized lookup on a pandas dataframe

I have two DataFrames . . .

df1 is a table I need to pull values from using index, column pairs retrieved from multiple columns in df2.

I see there is a function get_value which works perfectly when given an index and column value, but when trying to vectorize this function to create a new column I am failing…

df1 = pd.DataFrame(np.arange(20).reshape((4, 5)))

df1.columns = list('abcde')

df1.index = ['cat', 'dog', 'fish', 'bird']

        a   b   c   d   e
cat     0   1   2   3   4
dog     5   6   7   8   9
fish    10  11  12  13  14
bird    15  16  17  18  19

df1.get_value('bird, 'c')

17

Now what I need to do is to create an entire new column on df2 — when indexing df1 based on index, column pairs from the animal, letter columns specified in df2 effectively vectorizing the pd.get_value function above.

df2 = pd.DataFrame(np.arange(20).reshape((4, 5)))

df2['animal'] = ['cat', 'dog', 'fish', 'bird']

df2['letter'] = list('abcd')

    0   1   2   3   4   animal  letter
0   0   1   2   3   4   cat     a
1   5   6   7   8   9   dog     b
2   10  11  12  13  14  fish    c
3   15  16  17  18  19  bird    d

resulting in . . .

    0   1   2   3   4   animal  letter   looked_up
0   0   1   2   3   4   cat     a        0
1   5   6   7   8   9   dog     b        6
2   10  11  12  13  14  fish    c        12
3   15  16  17  18  19  bird    d        18

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Deprecation Notice: lookup was deprecated in v1.2.0

There’s a function aptly named lookup that does exactly this.

df2['looked_up'] = df1.lookup(df2.animal, df2.letter)

df2
 
    0   1   2   3   4 animal letter  looked_up
0   0   1   2   3   4    cat      a          0
1   5   6   7   8   9    dog      b          6
2  10  11  12  13  14   fish      c         12
3  15  16  17  18  19   bird      d         18

Method 2

If looking for a bit faster approach then zip will help in case of small dataframe i.e

k = list(zip(df2['animal'].values,df2['letter'].values))
df2['looked_up'] = [df1.get_value(*i) for i in k]

Output:

   0   1   2   3   4 animal letter  looked_up
0   0   1   2   3   4    cat      a          0
1   5   6   7   8   9    dog      b          6
2  10  11  12  13  14   fish      c         12
3  15  16  17  18  19   bird      d         18

As John suggested you can simplify the code which will be much faster.

 df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)]

In case of missing data use if else i.e

df2['looked_up'] = [df1.get_value(r, c) if not pd.isnull(c) | pd.isnull(r) else pd.np.nan for r, c in zip(df2.animal, df2.letter) ]

For small dataframes

%%timeit
df2['looked_up'] = df1.lookup(df2.animal, df2.letter)
1000 loops, best of 3: 801 µs per loop

k = list(zip(df2['animal'].values,df2['letter'].values))
df2['looked_up'] = [df1.get_value(*i) for i in k]
1000 loops, best of 3: 399 µs per loop

[df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)]
10000 loops, best of 3: 87.5 µs per loop

For large dataframe

df3 = pd.concat([df2]*10000)

%%timeit
k = list(zip(df3['animal'].values,df3['letter'].values))
df2['looked_up'] = [df1.get_value(*i) for i in k]
1 loop, best of 3: 185 ms per loop


df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df3.animal, df3.letter)]
1 loop, best of 3: 165 ms per loop

df2['looked_up'] = df1.lookup(df3.animal, df3.letter)
100 loops, best of 3: 8.82 ms per loop

Method 3

lookup and get_value are great answers if your values exist in lookup dataframe.

However, if you’ve (row, column) pairs not present in the lookup dataframe, and want the lookup value be NaNmerge and stack is one way to do it

In [206]: df2.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
                    left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
                    how='left').drop(['level_0', 'level_1'], 1)
Out[206]:
    0   1   2   3   4 animal letter  looked_up
0   0   1   2   3   4    cat      a          0
1   5   6   7   8   9    dog      b          6
2  10  11  12  13  14   fish      c         12
3  15  16  17  18  19   bird      d         18

Test with adding non-existing (animal, letter) pair

In [207]: df22
Out[207]:
      0     1     2     3     4 animal letter
0   0.0   1.0   2.0   3.0   4.0    cat      a
1   5.0   6.0   7.0   8.0   9.0    dog      b
2  10.0  11.0  12.0  13.0  14.0   fish      c
3  15.0  16.0  17.0  18.0  19.0   bird      d
4   NaN   NaN   NaN   NaN   NaN  dummy    NaN

In [208]: df22.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
                    left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
                    how='left').drop(['level_0', 'level_1'], 1)
Out[208]:
      0     1     2     3     4 animal letter  looked_up
0   0.0   1.0   2.0   3.0   4.0    cat      a        0.0
1   5.0   6.0   7.0   8.0   9.0    dog      b        6.0
2  10.0  11.0  12.0  13.0  14.0   fish      c       12.0
3  15.0  16.0  17.0  18.0  19.0   bird      d       18.0
4   NaN   NaN   NaN   NaN   NaN  dummy    NaN        NaN


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x