How can I remove duplicate characters from a string using Python? For example, let’s say I have a string:
foo = 'mppmt'
How can I make the string:
foo = 'mpt'
NOTE: Order is not important
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If order does not matter, you can use
"".join(set(foo))
set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.
If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)
foo = "mppmt" result = "".join(dict.fromkeys(foo))
resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.
Method 2
If order does matter, how about:
>>> foo = 'mppmt' >>> ''.join(sorted(set(foo), key=foo.index)) 'mpt'
Method 3
If order is not the matter:
>>> foo='mppmt' >>> ''.join(set(foo)) 'pmt'
To keep the order:
>>> foo='mppmt' >>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]]) 'mpt'
Method 4
Create a list in Python and also a set which doesn’t allow any duplicates.
Solution1 :
def fix(string):
s = set()
list = []
for ch in string:
if ch not in s:
s.add(ch)
list.append(ch)
return ''.join(list)
string = "Protiijaayiiii"
print(fix(string))
Method 2 :
s = "Protijayi"
aa = [ ch for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))
Method 5
As was mentioned “”.join(set(foo)) and collections.OrderedDict will do.
A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they’re upper or lower characters.
from collections import OrderedDict foo = "EugeneEhGhsnaWW" foo = foo.lower() print "".join(OrderedDict.fromkeys(foo))
prints eugnhsaw
Method 6
#Check code and apply in your Program: #Input= 'pppmm'
s = 'ppppmm' s = ''.join(set(s)) print(s) #Output: pm
Method 7
If order is important,
seen = set()
result = []
for c in foo:
if c not in seen:
result.append(c)
seen.add(c)
result = ''.join(result)
Or to do it without sets:
result = []
for c in foo:
if c not in result:
result.append(c)
result = ''.join(result)
Method 8
def dupe(str1):
s=set(str1)
return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)
works well if order is not important.
Method 9
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
if c not in d:
res.append(c)
d[c]=1
print ("".join(res))
variable ‘c’ traverses through String ‘s’ in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array ‘res’ then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.
Method 10
As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.
"".join(list(dict.fromkeys(foo)))
Method 11
Functional programming style while keeping order:
import functools
def get_unique_char(a, b):
if b not in a:
return a + b
else:
return a
if __name__ == '__main__':
foo = 'mppmt'
gen = functools.reduce(get_unique_char, foo)
print(''.join(list(gen)))
Method 12
Using regular expressions:
import re pattern = r'(.)1+' # (.) any character repeated (+) more than repl = r'1' # replace it once text = 'shhhhh!!! re.sub(pattern,repl,text)
output:
sh!
Method 13
def remove_duplicates(value):
var=""
for i in value:
if i in value:
if i in var:
pass
else:
var=var+i
return var
print(remove_duplicates("<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="271616151514131312121111111111464546455d5d5d67">[email protected]</a>@@123#*#*"))
Method 14
from collections import OrderedDict
def remove_duplicates(value):
m=list(OrderedDict.fromkeys(value))
s=''
for i in m:
s+=i
return s
print(remove_duplicates("[email protected]@@123#*#*"))
Method 15
mylist=["ABA", "CAA", "ADA"]
results=[]
for item in mylist:
buffer=[]
for char in item:
if char not in buffer:
buffer.append(char)
results.append("".join(buffer))
print(results)
output
ABA
CAA
ADA
['AB', 'CA', 'AD']
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0