How can I get the rows by distinct values in COL2?
For example, I have the dataframe below:
COL1 COL2 a.com 22 b.com 45 c.com 34 e.com 45 f.com 56 g.com 22 h.com 45
I want to get the rows based on unique values in COL2:
COL1 COL2 a.com 22 b.com 45 c.com 34 f.com 56
So, how can I get that? I would appreciate it very much if anyone can provide any help.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use drop_duplicates with specifying column COL2 for check duplicates:
df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
COL1 COL2
0 a.com 22
1 b.com 45
2 c.com 34
4 f.com 56
You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last')
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
5 g.com 22
6 h.com 45
Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False)
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
Method 2
You can use groupby in combination with first and last methods.
To get the first row from each group:
df.groupby('COL2', as_index=False).first()
Output:
COL2 COL1 0 22 a.com 1 34 c.com 2 45 b.com 3 56 f.com
To get the last row from each group:
df.groupby('COL2', as_index=False).last()
Output:
COL2 COL1 0 22 g.com 1 34 c.com 2 45 h.com 3 56 f.com
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0