How calculate diff() in condition value? Python

I have a pandas df, like this:

    ID  date        value
0   10  2022-01-01  100
1   10  2022-01-02  150
2   10  2022-01-03  0
3   10  2022-01-04  0
4   10  2022-01-05  200
5   10  2022-01-06  0
6   10  2022-01-07  150
7   10  2022-01-08  0
8   10  2022-01-09  0
9   10  2022-01-10  0
10  10  2022-01-11  0
11  10  2022-01-12  100
12  23  2022-02-01  490
13  23  2022-02-02  0
14  23  2022-02-03  350
15  23  2022-02-04  333
16  23  2022-02-05  0
17  23  2022-02-06  0
18  23  2022-02-07  0
19  23  2022-02-08  211
20  23  2022-02-09  100

I would like calculate the days of last value. Like the bellow example. How can I using diff() for this? And the calculus change by ID.

Output:

    ID  date        value  days_last_value
0   10  2022-01-01  100    0
1   10  2022-01-02  150    1
2   10  2022-01-03  0
3   10  2022-01-04  0
4   10  2022-01-05  200    3
5   10  2022-01-06  0
6   10  2022-01-07  150    2
7   10  2022-01-08  0
8   10  2022-01-09  0
9   10  2022-01-10  0
10  10  2022-01-11  0
11  10  2022-01-12  100    5
12  23  2022-02-01  490    0
13  23  2022-02-02  0
14  23  2022-02-03  350    2
15  23  2022-02-04  333    1
16  23  2022-02-05  0
17  23  2022-02-06  0
18  23  2022-02-07  0
19  23  2022-02-08  211    4
20  23  2022-02-09  100    1

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Explanation below.

import pandas as pd

df = pd.DataFrame({'ID': 12 * [10] + 9 * [23], 
                   'value': [100, 150, 0, 0, 200, 0, 150, 0, 0, 0, 0, 100, 490, 0, 350, 333, 0, 0, 0, 211, 100]})

days = df.groupby(['ID', (df['value'] != 0).cumsum()]).size().groupby('ID').shift(fill_value=0)
days.index = df.index[df['value'] != 0]
df['days_last_value'] = days
df
    ID  value  days_last_value
0   10    100              0.0
1   10    150              1.0
2   10      0              NaN
3   10      0              NaN
4   10    200              3.0
5   10      0              NaN
6   10    150              2.0
7   10      0              NaN
8   10      0              NaN
9   10      0              NaN
10  10      0              NaN
11  10    100              5.0
12  23    490              0.0
13  23      0              NaN
14  23    350              2.0
15  23    333              1.0
16  23      0              NaN
17  23      0              NaN
18  23      0              NaN
19  23    211              4.0
20  23    100              1.0

First, we’ll have to group by ‘ID’.
We also creates groups for each block of days, by creating a True/False series where value is not 0, then performing a cumulative sum. That is the part (df['value'] != 0).cumsum(), which results in

0      1
1      2
2      2
3      2
4      3
5      3
6      4
7      4
8      4
9      4
10     4
11     5
12     6
13     6
14     7
15     8
16     8
17     8
18     8
19     9
20    10

We can use the values in this series to also group on; combining that with the ‘ID’ group, you have the individual blocks of days. This is the df.groupby(['ID', (df['value'] != 0).cumsum()]) part.

Now, for each block, we get its size, which is obviously the interval in days; which is what you want. We do need to shift one up, since we’ve counted the total number of days per group, and the difference would be one less; and fill with 0 at the bottom. But this shift has to be by ID group, so we first group by ID again before shifting (as we lost the grouping after doing .size()).

Now, this new series needs to get assigned back to the dataframe, but it’s obviously shorter. Since its index it also reset, we can’t easily reassign it (not with df[‘days_last_value’], df.loc[…] or df.iloc).

Instead, we select the index values of the original dataframe where value is not zero, and set the index of the days equal to that.
Now, it’s easy step to directly assign the days to relevant column in the dataframe: Pandas will match the indices.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x