Cumsum reset at NaN

If I have a pandas.core.series.Series named ts of either 1’s or NaN’s like this:

3382   NaN
3381   NaN
...
3369   NaN
3368   NaN
...
15     1
10   NaN
11     1
12     1
13     1
9    NaN
8    NaN
7    NaN
6    NaN
3    NaN
4      1
5      1
2    NaN
1    NaN
0    NaN

I would like to calculate cumsum of this serie but it should be reset (set to zero) at the location of the NaNs like below:

3382   0
3381   0
...
3369   0
3368   0
...
15     1
10     0
11     1
12     2
13     3
9      0
8      0
7      0
6      0
3      0
4      1
5      2
2      0
1      0
0      0

Ideally I would like to have a vectorized solution !

I ever see a similar question with Matlab :
Matlab cumsum reset at NaN?

but I don’t know how to translate this line d = diff([0 c(n)]);

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

A simple Numpy translation of your Matlab code is this:

import numpy as np

v = np.array([1., 1., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
n = np.isnan(v)
a = ~n
c = np.cumsum(a)
d = np.diff(np.concatenate(([0.], c[n])))
v[n] = -d
np.cumsum(v)

Executing this code returns the result array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.]). This solution will only be as valid as the original one, but maybe it will help you come up with something better if it isn’t sufficient for your purposes.

Method 2

Even more pandas-onic way to do it:

v = pd.Series([1., 3., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()

Contrary to the matlab code, this also works for values different from 1.

Method 3

Here’s a slightly more pandas-onic way to do it:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Note that either of these requires that you’re using pandas at least at 9da899b or newer. If you aren’t then you can cast the bool dtype to an int64 or float64 dtype:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.astype(float).cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Method 4

If you can accept a similar boolean Series b, try

(b.cumsum() - b.cumsum().where(~b).fillna(method='pad').fillna(0)).astype(int)

Starting from your Series ts, either b = (ts == 1) or b = ~ts.isnull().

Method 5

You can do that with expanding().apply and replace with method='backfill'

reset_at = 0

ts.expanding().apply(
    lambda s:
        s[
            (s != reset_at).replace(True, method='backfill')
        ].sum()
).fillna(0)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x