pandas: replace values in a column based on a condition in another dataframe if that value is in the second dataframe

I have two dataframes as follows,

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})

df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
                   'new_verbs':['went','opened','closed','shared','divided']})

I would like to replace the verbs in df.text with their past form in df2.new_verbs if the verbs are found in df2.verbs. and so far I have done the following,

df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs

but when I print out the result, not all verbs are correctly replaced. My desired output would be,

       text    pos    new
0       I   PRON   PRON
0    went   VERB     go
0      to    ADP    ADP
0  school   NOUN   NOUN
1  opened   VERB   open
1     the    DET    DET
1   green    ADJ    ADJ
1    door   NOUN   NOUN
2    went   VERB     go
2     out    ADP    ADP
2     and  CCONJ  CCONJ
2    play   VERB   play

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use a regex for that:

import re
regex = '|'.join(map(re.escape, df2['verbs']))
s = df2.set_index('verbs')['new_verbs']

df['text'] = df['text'].str.replace(regex, lambda m: s.get(m.group(), m),
                                    regex=True)

output (here as column text2 for clarity):

                  text                       pos                  text2
0       I go to school   [PRON, VERB, ADP, NOUN]       I went to school
1  open the green door    [VERB, DET, ADJ, NOUN]  opened the green door
2      go out and play  [VERB, ADP, CCONJ, VERB]      went out and play

Method 2

For smaller lists, you can use pandas replace and a dictionary like this:

verbs_map = dict(zip(df2.verbs, df2.new_verbs))
new_df.text.replace(verbs_map)

Basically, dict(zip(df2.verbs, df2.new_verbs) creates a new dictionary that maps old verbs to their new (past tense) verbs, e.g. {'go' : 'went' , 'close' : 'closed', ...}.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x