I have two DataFrames . . .
df1 is a table I need to pull values from using index, column pairs retrieved from multiple columns in df2.
I see there is a function get_value which works perfectly when given an index and column value, but when trying to vectorize this function to create a new column I am failing…
df1 = pd.DataFrame(np.arange(20).reshape((4, 5)))
df1.columns = list('abcde')
df1.index = ['cat', 'dog', 'fish', 'bird']
a b c d e
cat 0 1 2 3 4
dog 5 6 7 8 9
fish 10 11 12 13 14
bird 15 16 17 18 19
df1.get_value('bird, 'c')
17
Now what I need to do is to create an entire new column on df2 — when indexing df1 based on index, column pairs from the animal, letter columns specified in df2 effectively vectorizing the pd.get_value function above.
df2 = pd.DataFrame(np.arange(20).reshape((4, 5)))
df2['animal'] = ['cat', 'dog', 'fish', 'bird']
df2['letter'] = list('abcd')
0 1 2 3 4 animal letter
0 0 1 2 3 4 cat a
1 5 6 7 8 9 dog b
2 10 11 12 13 14 fish c
3 15 16 17 18 19 bird d
resulting in . . .
0 1 2 3 4 animal letter looked_up 0 0 1 2 3 4 cat a 0 1 5 6 7 8 9 dog b 6 2 10 11 12 13 14 fish c 12 3 15 16 17 18 19 bird d 18
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Deprecation Notice:
lookupwas deprecated in v1.2.0
There’s a function aptly named lookup that does exactly this.
df2['looked_up'] = df1.lookup(df2.animal, df2.letter)
df2
0 1 2 3 4 animal letter looked_up
0 0 1 2 3 4 cat a 0
1 5 6 7 8 9 dog b 6
2 10 11 12 13 14 fish c 12
3 15 16 17 18 19 bird d 18
Method 2
If looking for a bit faster approach then zip will help in case of small dataframe i.e
k = list(zip(df2['animal'].values,df2['letter'].values)) df2['looked_up'] = [df1.get_value(*i) for i in k]
Output:
0 1 2 3 4 animal letter looked_up 0 0 1 2 3 4 cat a 0 1 5 6 7 8 9 dog b 6 2 10 11 12 13 14 fish c 12 3 15 16 17 18 19 bird d 18
As John suggested you can simplify the code which will be much faster.
df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)]
In case of missing data use if else i.e
df2['looked_up'] = [df1.get_value(r, c) if not pd.isnull(c) | pd.isnull(r) else pd.np.nan for r, c in zip(df2.animal, df2.letter) ]
For small dataframes
%%timeit df2['looked_up'] = df1.lookup(df2.animal, df2.letter) 1000 loops, best of 3: 801 µs per loop k = list(zip(df2['animal'].values,df2['letter'].values)) df2['looked_up'] = [df1.get_value(*i) for i in k] 1000 loops, best of 3: 399 µs per loop [df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)] 10000 loops, best of 3: 87.5 µs per loop
For large dataframe
df3 = pd.concat([df2]*10000) %%timeit k = list(zip(df3['animal'].values,df3['letter'].values)) df2['looked_up'] = [df1.get_value(*i) for i in k] 1 loop, best of 3: 185 ms per loop df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df3.animal, df3.letter)] 1 loop, best of 3: 165 ms per loop df2['looked_up'] = df1.lookup(df3.animal, df3.letter) 100 loops, best of 3: 8.82 ms per loop
Method 3
lookup and get_value are great answers if your values exist in lookup dataframe.
However, if you’ve (row, column) pairs not present in the lookup dataframe, and want the lookup value be NaN — merge and stack is one way to do it
In [206]: df2.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
how='left').drop(['level_0', 'level_1'], 1)
Out[206]:
0 1 2 3 4 animal letter looked_up
0 0 1 2 3 4 cat a 0
1 5 6 7 8 9 dog b 6
2 10 11 12 13 14 fish c 12
3 15 16 17 18 19 bird d 18
Test with adding non-existing (animal, letter) pair
In [207]: df22
Out[207]:
0 1 2 3 4 animal letter
0 0.0 1.0 2.0 3.0 4.0 cat a
1 5.0 6.0 7.0 8.0 9.0 dog b
2 10.0 11.0 12.0 13.0 14.0 fish c
3 15.0 16.0 17.0 18.0 19.0 bird d
4 NaN NaN NaN NaN NaN dummy NaN
In [208]: df22.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
how='left').drop(['level_0', 'level_1'], 1)
Out[208]:
0 1 2 3 4 animal letter looked_up
0 0.0 1.0 2.0 3.0 4.0 cat a 0.0
1 5.0 6.0 7.0 8.0 9.0 dog b 6.0
2 10.0 11.0 12.0 13.0 14.0 fish c 12.0
3 15.0 16.0 17.0 18.0 19.0 bird d 18.0
4 NaN NaN NaN NaN NaN dummy NaN NaN
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0