top values from dictionary

How do I retrive the top 3 list from a dictionary?

>>> d
{'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}

Expected result:

and: 23
only: 21
this: 14

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Use collections.Counter:

>>> d = Counter({'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4})
>>> d.most_common()
[('and', 23), ('only.', 21), ('this', 14), ('test', 4), ('a', 2), ('is', 2), ('work', 2), ('will', 2), ('as', 2)]
>>> for k, v in d.most_common(3):
...     print '%s: %i' % (k, v)
... 
and: 23
only.: 21
this: 14

Counter objects offer various other advantages, such as making it almost trivial to collect the counts in the first place.

Method 2

>>> d = {'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}
>>> t = sorted(d.iteritems(), key=lambda x:-x[1])[:3]

>>> for x in t:
...     print "{0}: {1}".format(*x)
... 
and: 23
only.: 21
this: 14

Method 3

The replies you already got are right, I would however create my own key function to use when call sorted().

d = {'a': 2, 'and': 23, 'this': 14, 'only.': 21, 'is': 2, 'work': 2, 'will': 2, 'as': 2, 'test': 4}

# create a function which returns the value of a dictionary
def keyfunction(k):
    return d[k]

# sort by dictionary by the values and print top 3 {key, value} pairs
for key in sorted(d, key=keyfunction, reverse=True)[:3]:
    print "%s: %i" % (key, d[key])

Method 4

Given the solutions above:

def most_popular(L):
  # using lambda
  start = datetime.datetime.now()
  res=dict(sorted([(k,v) for k, v in L.items()], key=lambda x: x[1])[-2:])
  delta=datetime.datetime.now()-start
  print "Microtime (lambda:%d):" % len(L), str( delta.microseconds )

  # using collections
  start=datetime.datetime.now()
  res=dict(collections.Counter(L).most_common()[:2])
  delta=datetime.datetime.now()-start
  print "Microtime (collections:%d):" % len(L), str( delta.microseconds )

# list of 10
most_popular({el:0 for el in list(range(10))})

# list of 100
most_popular({el:0 for el in list(range(100))})

# list of 1000
most_popular({el:0 for el in list(range(1000))})

# list of 10000
most_popular({el:0 for el in list(range(10000))})

# list of 100000
most_popular({el:0 for el in list(range(100000))})

# list of 1000000
most_popular({el:0 for el in list(range(1000000))})

Working on dataset dict of size from 10^1 to 10^6 dict of objects like

print {el:0 for el in list(range(10))}
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}

we have the following benchmarks

Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.8.2] on linux

Microtime (lambda:10): 24
Microtime (collections:10): 106
Microtime (lambda:100): 49
Microtime (collections:100): 50
Microtime (lambda:1000): 397
Microtime (collections:1000): 178
Microtime (lambda:10000): 4347
Microtime (collections:10000): 2782
Microtime (lambda:100000): 55738
Microtime (collections:100000): 26546
Microtime (lambda:1000000): 798612
Microtime (collections:1000000): 361970
=> None

So we can say that for small lists use lambda, but for huge list, collections has better performances.

See the benchmark running here.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating