Lets say this is my data-frame
df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
'center' : ['one', 'one', 'two', 'three'],
'outcome' : ['f','t','f','f'] })
It looks like this …
bio center outcome 0 1 one f 1 1 one t 2 1 two f 3 4 three f
I want to drop row 1 because it has the same bio & center as row 0.
I want to keep row 2 because it has the same bio but different center then row 0.
Something like this won’t work based on drop_duplicates input structure but it’s what I am trying to do
df.drop_duplicates(subset = 'bio' & subset = 'center' )
Any suggestions ?
edit : changed df a bit to fit example by correct answer
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Your syntax is wrong. Here’s the correct way:
df.drop_duplicates(subset=['bio', 'center', 'outcome'])
Or in this specific case, just simply:
df.drop_duplicates()
Both return the following:
bio center outcome 0 1 one f 2 1 two f 3 4 three f
Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column labels.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0