Get rows based on distinct values from one column

How can I get the rows by distinct values in COL2?

For example, I have the dataframe below:

COL1   COL2
a.com  22
b.com  45
c.com  34
e.com  45
f.com  56
g.com  22
h.com  45

I want to get the rows based on unique values in COL2:

COL1  COL2
a.com 22
b.com 45
c.com 34
f.com 56

So, how can I get that? I would appreciate it very much if anyone can provide any help.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Use drop_duplicates with specifying column COL2 for check duplicates:

df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
    COL1  COL2
0  a.com    22
1  b.com    45
2  c.com    34
4  f.com    56

You can also keep only last values:

df = df.drop_duplicates('COL2', keep='last')
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56
5  g.com    22
6  h.com    45

Or remove all duplicates:

df = df.drop_duplicates('COL2', keep=False)
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56

Method 2

You can use groupby in combination with first and last methods.
To get the first row from each group:

df.groupby('COL2', as_index=False).first()

Output:

   COL2   COL1
0    22  a.com
1    34  c.com
2    45  b.com
3    56  f.com

To get the last row from each group:

df.groupby('COL2', as_index=False).last()

Output:

   COL2   COL1
0    22  g.com
1    34  c.com
2    45  h.com
3    56  f.com


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x