>>from itertools import groupby >>keyfunc = lambda x : x > 500 >>obj = dict(groupby(range(1000), keyfunc)) >>list(obj[True]) [999] >>list(obj[False]) []
range(1000) is obviously sorted by default for the condition (x > 500).
I was expecting the numbers from 0 to 999 to be grouped in a dict by the condition (x > 500). But the resulting dictionary had only 999.
where are the other numbers?.
Can any one explain what is happening here?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
From the docs:
The returned group is itself an iterator that shares the underlying iterable with
groupby(). Because the source is shared, when thegroupby()object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list[.]
And you are storing iterators in obj and materializing them later.
In [21]: dict((k, list(g)) for k, g in groupby(range(10), lambda x : x > 5))
Out[21]: {False: [0, 1, 2, 3, 4, 5], True: [6, 7, 8, 9]}
Method 2
The groupby iterator returns tuples of the outcome of the grouping function and a new iterator that is tied to the same “outer” iterator the groupby operator is working on. When you apply dict() to the iterator returned by groupby without consuming this “inner” iterator, groupby will have to advance the “outer” iterator for you. You have to realize that the groupby function does not act on a sequence, it turns any such sequence to an iterator for you.
Perhaps this is better explained with some metaphors and handwaving. Please follow along as we form a bucket line.
Imagine iterators as a person drawing water in buckets from a well. He has an unlimited number of buckets to use, but the well may be finite. Every time you ask this person for a bucket of water, he’ll draw a new bucket from the well of water and pass it to you.
In the groupby case, you insert another person into your budding bucket chain. This person doesn’t immediately pass buckets at all. He passes you the outcome of instructions you gave it plus another person every time you ask for a bucket, whom will then pass you buckets via the groupby person to whomever is asking, as long as they match the same outcome to the instructions. The groupby bucket passer will stop passing these buckets if the outcome of the instructions changes. So well gives buckets to groupby, who passes this to a per-group person, group A, group B, and so on.
In your example, the water is numbered, but there can only be 1000 buckets drawn from the well. Here is what happens when you then pass the groupby person to the dict() call:
-
Your
dict()call asksgroupbyfor a bucket. Now,groupbyasks for one bucket from the person at the well, remembers the outcome of the instructions given, holding on to the bucket. Todict()he’ll pass the outcome of the instructions (False) plus a new person,group A. The outcome is stored as the key, and thegroup Aperson, who wants to pull buckets is stored as the value. This person is not yet asking for buckets however, because no-one is asking it to. -
Your
dict()call asksgroupbyfor another bucket.groupbyhas these instructions, and goes looking for the next bucket where the outcome changes. It was still holding on to the first bucket, no-one asked for it, so it throws away this bucket. Instead, it asks for the next bucket from the well and uses his instructions. The outcome is the same as before, so it throws this new bucket away too! More water goes over the floor, and so go the next 499 buckets. Only when the bucket with number 501 is passed does the outcome change, so nowgroupbyfinds another person to give instructions to (persongroup B), together with the new outcome,True, passing these two on todict(). -
Your
dict()call storesTrueas a key, and persongroup Bas the value.group Bdoes nothing, no-one is asking it for water. -
Your
dict()asks for another bucket.groupbyspills more water, until it holds bucket with the number 999, and the person at the well shrugs his shoulders and states that now the well is empty.groupbytellsdict()the well is empty, no more buckets are coming, could he please stop asking. It still holds the bucket with number 999, because it never has to make space for the next bucket from the well. -
Now you come along, asking
dict()for the thing associated with the keyTrue, which is persongroup B. You passgroup Btolist(), which will therefore askgroup Bfor all the bucketsgroup Bcan get.group Bgoes back togroupby, who holds one bucket only, the bucket with number 999, and the outcome of the instructions for this bucket match whatgroup Bis looking for. So this one bucketgroup Bgives tolist(), then shrugs his shoulders because there are no more buckets, becausegroupbytold him so. -
You then ask
dict()for the person associated with the keyFalse, which is persongroup A. By now,groupbyhas nothing to give any more, the well is dry and he’s standing in a puddle of 999 buckets of water with numbers floating around. Your secondlist()gets nothing.
The moral of this story? Immediately ask for all buckets of water when talking to groupby, because he’ll spill them all if you do not! Iterators are like the brooms in fantasia, diligently moving water without understanding, and you better hope you run out of water if you do not know how to control them.
Here is code that would do what you expect (with a little bit less water to prevent flooding):
>>> from itertools import groupby >>> keyfunc = lambda x : x > 5 >>> obj = dict((k, list(v)) for k, v in groupby(range(10), keyfunc)) >>> obj(True) [0, 1, 2, 3, 4, 5] >>> obj(False) [6, 7, 8, 9]
Method 3
The thing you are missing is, that the groupby-function iterates over your given range(1000), thus returning 1000 values. You are only saving the last one, in your case 999. What you have to do is, is to iterate over the return values and save them to your dictionary:
dictionary = {}
keyfunc = lambda x : x > 500
for k, g in groupby(range(1000), keyfunc):
dictionary[k] = list(g)
So the you would get the expected output:
{False: [0, 1, 2, ...], True: [501, 502, 503, ...]}
For more information, see the Python docs about itertools groupby.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0