I’m trying to merge logs from several servers. Each log is a list of tuples (date, count). date may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.
Here’s my attempt, with some data for example:
from collections import defaultdict
a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]
output=defaultdict(int)
for d in input:
for item in d:
output[item[0]]+=item[1]
print dict(output)
Which gives:
{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}
As expected.
I’m about to go bananas because of a colleague who saw the code. She insists that there must be a more Pythonic and elegant way to do it, without these nested for loops. Any ideas?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Doesn’t get simpler than this, I think:
a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]
from collections import Counter
print sum(
(Counter(dict(x)) for x in input),
Counter())
Note that Counter (also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently – a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.
Also possible:
from collections import Counter from operator import add print reduce(add, (Counter(dict(x)) for x in input))
Using reduce(add, seq) instead of sum(seq, initialValue) is generally more flexible and allows you to skip passing the redundant initial value.
Note that you could also use operator.and_ to find the intersection of the multisets instead of the sum.
The above variant is terribly slow, because a new Counter is created on every step. Let’s fix that.
We know that Counter+Counter returns a new Counter with merged data. This is OK, but we want to avoid extra creation. Let’s use Counter.update instead:
update(self, iterable=None, **kwds) unbound collections.Counter method
Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
That’s what we want. Let’s wrap it with a function compatible with reduce and see what happens.
def updateInPlace(a,b):
a.update(b)
return a
print reduce(updateInPlace, (Counter(dict(x)) for x in input))
This is only marginally slower than the OP’s solution.
Benchmark: http://ideone.com/7IzSx (Updated with yet another solution, thanks to astynax)
(Also: If you desperately want an one-liner, you can replace updateInPlace by lambda x,y: x.update(y) or x which works the same way and even proves to be a split second faster, but fails at readability. Don’t :-))
Method 2
from collections import Counter
a = [("13.5",100)]
b = [("14.5",100), ("15.5", 100)]
c = [("15.5",100), ("16.5", 100)]
inp = [dict(x) for x in (a,b,c)]
count = Counter()
for y in inp:
count += Counter(y)
print(count)
output:
Counter({'15.5': 200, '14.5': 100, '16.5': 100, '13.5': 100})
Edit:
As duncan suggested you can replace these 3 lines with a single line:
count = Counter()
for y in inp:
count += Counter(y)
replace by : count = sum((Counter(y) for y in inp), Counter())
Method 3
You could use itertools’ groupby:
from itertools import groupby, chain
a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input = sorted(chain(a,b,c), key=lambda x: x[0])
output = {}
for k, g in groupby(input, key=lambda x: x[0]):
output[k] = sum(x[1] for x in g)
print output
The use of groupby instead of two loops and a defaultdict will make your code clearer.
Method 4
You can use Counter or defaultdict, or you can try my variant:
def merge_with(d1, d2, fn=lambda x, y: x + y):
res = d1.copy() # "= dict(d1)" for lists of tuples
for key, val in d2.iteritems(): # ".. in d2" for lists of tuples
try:
res[key] = fn(res[key], val)
except KeyError:
res[key] = val
return res
>>> merge_with({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}
Or even more generic:
def make_merger(fappend=lambda x, y: x + y, fempty=lambda x: x):
def inner(*dicts):
res = dict((k, fempty(v)) for k, v
in dicts[0].iteritems()) # ".. in dicts[0]" for lists of tuples
for dic in dicts[1:]:
for key, val in dic.iteritems(): # ".. in dic" for lists of tuples
try:
res[key] = fappend(res[key], val)
except KeyError:
res[key] = fempty(val)
return res
return inner
>>> make_merger()({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}
>>> appender = make_merger(lambda x, y: x + [y], lambda x: [x])
>>> appender({'a':1, 'b':2}, {'a':3, 'c':4}, {'b':'BBB', 'c':'CCC'})
{'a': [1, 3], 'c': [4, 'CCC'], 'b': [2, 'BBB']}
Also you can subclass the dict and implement a __add__ method:
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0