I have a dataframe like this:
RecID| A |B ---------------- 1 |a | abc 2 |b | cba 3 |c | bca 4 |d | bac 5 |e | abc
And want to create another column, C, out of A and B such that for the same row, if the string in column A is contained in the string of column B, then C = True and if not then C = False.
The example output I am looking for is this:
RecID| A |B |C -------------------- 1 |a | abc |True 2 |b | cba |True 3 |c | bca |True 4 |d | bac |False 5 |e | abc |False
Is there a way to do this in pandas quickly and without using a loop? Thanks
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You need apply with in:
df['C'] = df.apply(lambda x: x.A in x.B, axis=1) print (df) RecID A B C 0 1 a abc True 1 2 b cba True 2 3 c bca True 3 4 d bac False 4 5 e abc False
Another solution with list comprehension is faster, but there has to be no NaNs:
df['C'] = [x[0] in x[1] for x in zip(df['A'], df['B'])] print (df) RecID A B C 0 1 a abc True 1 2 b cba True 2 3 c bca True 3 4 d bac False 4 5 e abc False
Method 2
I could not get either answer @jezreal provided to handle None’s in the first column. A slight alteration to the list comprehension is able to handle it:
[x[0] in x[1] if x[0] is not None else False for x in zip(df['A'], df['B'])]
Method 3
If you are comparing string to string and getting the Type Error you can code this like that:
df['C'] = df.apply(lambda x: str(x.A) in str(x.B), axis=1)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0