python pandas – creating a column which keeps a running count of consecutive values

I am trying to create a column (“consec”) which will keep a running count of consecutive values in another (“binary”) without using loop. This is what the desired outcome would look like:

.    binary consec
1       0      0
2       1      1
3       1      2
4       1      3
5       1      4
5       0      0
6       1      1
7       1      2
8       0      0

However, this…

df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']

results in this…

.  binary   consec
0     1       NaN
1     1       1
2     1       1
3     0       0
4     1       1
5     0       0
6     1       1
7     1       1
8     1       1
9     0       0

I see other posts which use grouping or sorting, but unfortunately, I don’t see how that could work for me. Thanks in advance for your help.

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount:

>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
   binary  consec
0       0       0
1       1       1
2       1       2
3       1       3
4       0       0
5       0       0
6       1       1
7       1       2
8       0       0

This works because first we get the positions where we want to reset the counter:

>>> (df["binary"] == 0)
0     True
1    False
2    False
3    False
4     True
5     True
6    False
7    False
8     True
Name: binary, dtype: bool

The cumulative sum of these gives us a different id for each group:

>>> (df["binary"] == 0).cumsum()
0    1
1    1
2    1
3    1
4    2
5    3
6    3
7    3
8    4
Name: binary, dtype: int64

And then we can pass this to groupby and use cumcount to get an increasing index in each group.

Method 2

For those who ended up here looking for an answer to the “misunderstood” version:
To reset count for each change in the binary column, so that consec does “keep a running count of consecutive values”, the following seems to work:

df["consec2"] = df["binary"].groupby((df["binary"] <> df["binary"].shift()).cumsum()).cumcount()

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating