I have this data:
self.data = [(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0),
(2, 1, 4.0),
(2, 2, 2.0)]
When I run this code:
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(0)):
for list(group) I get:
[(1, 1, 5.0), (1, 2, 3.0), (1, 3, 4.0)]
which is what I want.
But if I use 1 instead of 0
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(1)):
to group by the second number in the tuples, I only get:
[(1, 1, 5.0)]
even though there are other tuples that have “1” in that 1 (2nd) position.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
itertools.groupby collects together contiguous items with the same key.
If you want all items with the same key, you have to sort self.data first.
for mid, group in itertools.groupby(
sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
Method 2
Variant without sorting (via dictionary). Should be better performance-wise.
def full_group_by(l, key=lambda x: x):
d = defaultdict(list)
for item in l:
d[key(item)].append(item)
return d.items()
Method 3
Below “fixes” several annoyances with Python’s itertools.groupby.
def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
if sort:
l = sorted(l, key=key)
return ((k, agg((val(x) for x in v)))
for k,v in itertools.groupby(l, key=key))
Specifically,
- It doesn’t require that you sort your data.
- It doesn’t require that you must use
keyas named parameter only. - The output is clean generator of
tuple(key, grouped_values)where values are specified by 3rd parameter. - Ability to apply aggregation functions like sum or avg easily.
Example Usage
import itertools
from operator import itemgetter
from statistics import *
t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
print(k, v)
This prints,
a 4 b 2
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0