Get appropriate date with GROUP BY

I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.

So the conditions are:

  • For each user max view count is 1
  • View = 1 if one user’s SUM(duration) >= 120
  • Date should be fixed once SUM(duration) reaches 120

But the issue is here to get a correct date row. For example row1.duration + row2.duration >= 120 and thus view count = 1 should be applied for 2021-10-16

| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1  | 1       | 80       | 2021-10-15 | 1       |
| 2  | 1       | 70       | 2021-10-16 | 1       |
| 3  | 1       | 200      | 2021-10-17 | 2       |
| 4  | 2       | 50       | 2021-10-18 | 1       |
| 5  | 2       | 90       | 2021-10-18 | 1       |
| 6  | 3       | 140      | 2021-10-18 | 2       |
| 7  | 4       | 10       | 2021-10-19 | 3       |

Expected result:

| cnt   | created_at |
+-------+------------+
| 0     | 2021-10-15 |
| 1     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |

This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data

SELECT count(*) AS cnt,
         created_at
FROM 
    (SELECT user_id,
         sum(duration) AS total,
         created_at
    FROM watch_time
    GROUP BY  user_id) AS t
WHERE t.total >= 120
GROUP BY  created_at;

Is there any chance to have this work via SQL or it’s should be done in application level?

Thanks in advance!

Update:

Version: AWS RDS MySQL 5.7.33

But I’m ok to switch to Postgres if that can help.

Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).

Better than this one.

SELECT IFNULL(cnt, 0) as cnt,
         t3.created_at
FROM 
    (SELECT count(*) AS cnt,
         created_at
    FROM 
        (SELECT user_id,
         sum(duration) AS total,
         created_at
        FROM watch_time
        GROUP BY  user_id) AS t
        WHERE t.total >= 120
        GROUP BY  created_at) AS t2
    RIGHT JOIN 
    (SELECT distinct(created_at)
    FROM watch_time) AS t3
    ON t2.created_at = t3.created_at;

which returns:

| cnt   | created_at |
+-------+------------+
| 1     | 2021-10-15 |
| 0     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |

But I’m not sure whether the date(2021-10-15) has taken randomly or its always the lowest date

Update 2:

Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.

So in this case:

row1 & row2 both has user_id: 1 and film_id: 1, and the result is 1 view, because the sum of their durations is >= 120. so the date in this case will be 2021-10-16.

but row3 has user_id: 1 and film_id: 2, and with duration >= 120 it’s also a 1 view with date 2021-10-17

| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1  | 1       | 80       | 2021-10-15 | 1       |
| 2  | 1       | 70       | 2021-10-16 | 1       |
| 3  | 1       | 200      | 2021-10-17 | 2       |
| 4  | 2       | 50       | 2021-10-18 | 1       |
| 5  | 2       | 90       | 2021-10-18 | 1       |
| 6  | 3       | 140      | 2021-10-18 | 2       |
| 7  | 4       | 10       | 2021-10-19 | 3       |

Expected result:

| cnt   | created_at |
+-------+------------+
| 0     | 2021-10-15 |
| 1     | 2021-10-16 |
| 1     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row

http://sqlfiddle.com/#!9/569088/14

SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
  SELECT user_id, created_at, 
  CASE WHEN @UID != user_id THEN @SUM_TIME := 0 WHEN @SUM_TIME >= 120 AND @DT != created_at THEN @SUM_TIME := 0 - duration ELSE 0 END SX,
  @SUM_TIME := @SUM_TIME + duration AS duration,
  @UID := user_id,
  @DT := created_at
  FROM watch_time
  JOIN ( SELECT @SUM_TIME :=0, @DT := NOW(), @UID := '' ) t
  ORDER BY user_id, created_at
) f
GROUP BY created_at

Method 2

I think I misunderstood the requirement in my first attempt.

Second attempt

MySql >= 8.0 (or Postgresl) using window functions

I know you are working with MySql 5.7, I add an answer for it next.

I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?

First, I get cumulative sum by user (cte subquery) ordered by date. In subquery cte1 with a CASE statement I set one the first time a user reach 120 minutes (view column). Finally I group by created_at (date) and count() ones in view column:

WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
              FROM watch_time),
              
     cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view 
              FROM cte)

SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;
created_at cnt
2021-10-15 0
2021-10-16 1
2021-10-17 0
2021-10-18 2
2021-10-19 0

MySql 5.7

I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id and get MIN(created_at). Finally I group by min_created_at and count records.

SELECT min_created_at AS date, count(*) AS cnt

FROM (SELECT user_id, MIN(created_at) AS min_created_at

    FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
        FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
        INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at 
        GROUP BY wt1.user_id, wt1.created_at
        HAVING SUM(wt2.duration) >= 120) AS sq

    GROUP BY user_id) AS sq2
GROUP BY min_created_at;
date cnt
2021-10-16 1
2021-10-18 2

You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.

First attempt

I understood that you want count one each time a user reach 120 minutes per day.

First, I get the total movie watch time by user and date (subquery sq), then with a CASE statement I set one each time a user in a date exceed 119 minutes, I group by created_at (date) and count() ones in CASE statement:

SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
             FROM watch_time
             GROUP BY created_at, user_id) AS sq
GROUP BY created_at;

Output (with sample data from the question):

reated_at cnt
2021-10-15 0
2021-10-16 0
2021-10-17 1
2021-10-18 2
2021-10-19 0


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x