# Get appropriate date with GROUP BY

I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.

So the conditions are:

• For each user max view count is 1
• View = 1 if one user’s SUM(duration) >= 120
• Date should be fixed once SUM(duration) reaches 120

But the issue is here to get a correct date row. For example `row1.duration + row2.duration >= 120` and thus view count = 1 should be applied for `2021-10-16`

```| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1  | 1       | 80       | 2021-10-15 | 1       |
| 2  | 1       | 70       | 2021-10-16 | 1       |
| 3  | 1       | 200      | 2021-10-17 | 2       |
| 4  | 2       | 50       | 2021-10-18 | 1       |
| 5  | 2       | 90       | 2021-10-18 | 1       |
| 6  | 3       | 140      | 2021-10-18 | 2       |
| 7  | 4       | 10       | 2021-10-19 | 3       |```

Expected result:

```| cnt   | created_at |
+-------+------------+
| 0     | 2021-10-15 |
| 1     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |```

This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data

```SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY  user_id) AS t
WHERE t.total >= 120
GROUP BY  created_at;```

Is there any chance to have this work via SQL or it’s should be done in application level?

Update:

Version: AWS RDS MySQL 5.7.33

But I’m ok to switch to Postgres if that can help.

Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).

Better than this one.

```SELECT IFNULL(cnt, 0) as cnt,
t3.created_at
FROM
(SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY  user_id) AS t
WHERE t.total >= 120
GROUP BY  created_at) AS t2
RIGHT JOIN
(SELECT distinct(created_at)
FROM watch_time) AS t3
ON t2.created_at = t3.created_at;```

which returns:

```| cnt   | created_at |
+-------+------------+
| 1     | 2021-10-15 |
| 0     | 2021-10-16 |
| 0     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |```

But I’m not sure whether the date(2021-10-15) has taken randomly or its always the lowest date

Update 2:

Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.

So in this case:

`row1` & `row2` both has `user_id: 1` and `film_id: 1`, and the result is 1 view, because the sum of their `durations is >= 120`. so the date in this case will be `2021-10-16`.

but `row3` has `user_id: 1` and `film_id: 2`, and with `duration >= 120` it’s also a 1 view with date `2021-10-17`

```| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1  | 1       | 80       | 2021-10-15 | 1       |
| 2  | 1       | 70       | 2021-10-16 | 1       |
| 3  | 1       | 200      | 2021-10-17 | 2       |
| 4  | 2       | 50       | 2021-10-18 | 1       |
| 5  | 2       | 90       | 2021-10-18 | 1       |
| 6  | 3       | 140      | 2021-10-18 | 2       |
| 7  | 4       | 10       | 2021-10-19 | 3       |```

Expected result:

```| cnt   | created_at |
+-------+------------+
| 0     | 2021-10-15 |
| 1     | 2021-10-16 |
| 1     | 2021-10-17 |
| 2     | 2021-10-18 |
| 0     | 2021-10-19 |```
Contents

### Method 1

Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row

http://sqlfiddle.com/#!9/569088/14

```SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
SELECT user_id, created_at,
CASE WHEN @UID != user_id THEN @SUM_TIME := 0 WHEN @SUM_TIME >= 120 AND @DT != created_at THEN @SUM_TIME := 0 - duration ELSE 0 END SX,
@SUM_TIME := @SUM_TIME + duration AS duration,
@UID := user_id,
@DT := created_at
FROM watch_time
JOIN ( SELECT @SUM_TIME :=0, @DT := NOW(), @UID := '' ) t
ORDER BY user_id, created_at
) f
GROUP BY created_at```

### Method 2

I think I misunderstood the requirement in my first attempt.

Second attempt

MySql >= 8.0 (or `Postgresl`) using `window functions`

I know you are working with `MySql 5.7`, I add an answer for it next.

I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?

First, I get cumulative sum by user (`cte` subquery) ordered by date. In subquery `cte1` with a CASE statement I set one the first time a user reach 120 minutes (`view` column). Finally I group by created_at (date) and count() ones in `view` column:

```WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
FROM watch_time),

cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view
FROM cte)

SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;```
created_at cnt
2021-10-15 0
2021-10-16 1
2021-10-17 0
2021-10-18 2
2021-10-19 0

MySql 5.7

I get the cumulative sum for each user and filter cumulative duration >= 120, then I `group by user_id` and get `MIN(created_at)`. Finally I group by `min_created_at` and count records.

```SELECT min_created_at AS date, count(*) AS cnt

FROM (SELECT user_id, MIN(created_at) AS min_created_at

FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at
GROUP BY wt1.user_id, wt1.created_at
HAVING SUM(wt2.duration) >= 120) AS sq

GROUP BY user_id) AS sq2
GROUP BY min_created_at;```
date cnt
2021-10-16 1
2021-10-18 2

You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.

First attempt

I understood that you want count one each time a user reach 120 minutes per day.

First, I get the total movie watch time by user and date (subquery sq), then with a `CASE` statement I set one each time a user in a date exceed 119 minutes, I group by `created_at` (date) and `count()` ones in `CASE` statement:

```SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
FROM watch_time
GROUP BY created_at, user_id) AS sq
GROUP BY created_at;```

Output (with sample data from the question):

reated_at cnt
2021-10-15 0
2021-10-16 0
2021-10-17 1
2021-10-18 2
2021-10-19 0