I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.
So the conditions are:
- For each user max view count is 1
- View = 1 if one user’s SUM(duration) >= 120
- Date should be fixed once SUM(duration) reaches 120
But the issue is here to get a correct date row. For example row1.duration + row2.duration >= 120
and thus view count = 1 should be applied for 2021-10-16
| id | user_id | duration | created_at | film_id | +----+---------+----------+------------+---------+ | 1 | 1 | 80 | 2021-10-15 | 1 | | 2 | 1 | 70 | 2021-10-16 | 1 | | 3 | 1 | 200 | 2021-10-17 | 2 | | 4 | 2 | 50 | 2021-10-18 | 1 | | 5 | 2 | 90 | 2021-10-18 | 1 | | 6 | 3 | 140 | 2021-10-18 | 2 | | 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at | +-------+------------+ | 0 | 2021-10-15 | | 1 | 2021-10-16 | | 0 | 2021-10-17 | | 2 | 2021-10-18 | | 0 | 2021-10-19 |
This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data
SELECT count(*) AS cnt, created_at FROM (SELECT user_id, sum(duration) AS total, created_at FROM watch_time GROUP BY user_id) AS t WHERE t.total >= 120 GROUP BY created_at;
Is there any chance to have this work via SQL or it’s should be done in application level?
Thanks in advance!
Update:
Version: AWS RDS MySQL 5.7.33
But I’m ok to switch to Postgres if that can help.
Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).
Better than this one.
SELECT IFNULL(cnt, 0) as cnt, t3.created_at FROM (SELECT count(*) AS cnt, created_at FROM (SELECT user_id, sum(duration) AS total, created_at FROM watch_time GROUP BY user_id) AS t WHERE t.total >= 120 GROUP BY created_at) AS t2 RIGHT JOIN (SELECT distinct(created_at) FROM watch_time) AS t3 ON t2.created_at = t3.created_at;
which returns:
| cnt | created_at | +-------+------------+ | 1 | 2021-10-15 | | 0 | 2021-10-16 | | 0 | 2021-10-17 | | 2 | 2021-10-18 | | 0 | 2021-10-19 |
But I’m not sure whether the date(2021-10-15) has taken randomly or its always the lowest date
Update 2:
Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.
So in this case:
row1
& row2
both has user_id: 1
and film_id: 1
, and the result is 1 view, because the sum of their durations is >= 120
. so the date in this case will be 2021-10-16
.
but row3
has user_id: 1
and film_id: 2
, and with duration >= 120
it’s also a 1 view with date 2021-10-17
| id | user_id | duration | created_at | film_id | +----+---------+----------+------------+---------+ | 1 | 1 | 80 | 2021-10-15 | 1 | | 2 | 1 | 70 | 2021-10-16 | 1 | | 3 | 1 | 200 | 2021-10-17 | 2 | | 4 | 2 | 50 | 2021-10-18 | 1 | | 5 | 2 | 90 | 2021-10-18 | 1 | | 6 | 3 | 140 | 2021-10-18 | 2 | | 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at | +-------+------------+ | 0 | 2021-10-15 | | 1 | 2021-10-16 | | 1 | 2021-10-17 | | 2 | 2021-10-18 | | 0 | 2021-10-19 |
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row
http://sqlfiddle.com/#!9/569088/14
SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts FROM ( SELECT user_id, created_at, CASE WHEN @UID != user_id THEN @SUM_TIME := 0 WHEN @SUM_TIME >= 120 AND @DT != created_at THEN @SUM_TIME := 0 - duration ELSE 0 END SX, @SUM_TIME := @SUM_TIME + duration AS duration, @UID := user_id, @DT := created_at FROM watch_time JOIN ( SELECT @SUM_TIME :=0, @DT := NOW(), @UID := '' ) t ORDER BY user_id, created_at ) f GROUP BY created_at
Method 2
I think I misunderstood the requirement in my first attempt.
Second attempt
MySql >= 8.0 (or Postgresl
) using window functions
I know you are working with MySql 5.7
, I add an answer for it next.
I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?
First, I get cumulative sum by user (cte
subquery) ordered by date. In subquery cte1
with a CASE statement I set one the first time a user reach 120 minutes (view
column). Finally I group by created_at (date) and count() ones in view
column:
WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration FROM watch_time), cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view FROM cte) SELECT created_at, COUNT(view) AS cnt FROM cte1 GROUP BY created_at;
created_at | cnt |
---|---|
2021-10-15 | 0 |
2021-10-16 | 1 |
2021-10-17 | 0 |
2021-10-18 | 2 |
2021-10-19 | 0 |
MySql 5.7
I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id
and get MIN(created_at)
. Finally I group by min_created_at
and count records.
SELECT min_created_at AS date, count(*) AS cnt FROM (SELECT user_id, MIN(created_at) AS min_created_at FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1 INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at GROUP BY wt1.user_id, wt1.created_at HAVING SUM(wt2.duration) >= 120) AS sq GROUP BY user_id) AS sq2 GROUP BY min_created_at;
date | cnt |
---|---|
2021-10-16 | 1 |
2021-10-18 | 2 |
You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.
First attempt
I understood that you want count one each time a user reach 120 minutes per day.
First, I get the total movie watch time by user and date (subquery sq), then with a CASE
statement I set one each time a user in a date exceed 119 minutes, I group by created_at
(date) and count()
ones in CASE
statement:
SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt FROM (SELECT created_at, user_id, SUM(duration) AS total_duration FROM watch_time GROUP BY created_at, user_id) AS sq GROUP BY created_at;
Output (with sample data from the question):
reated_at | cnt |
---|---|
2021-10-15 | 0 |
2021-10-16 | 0 |
2021-10-17 | 1 |
2021-10-18 | 2 |
2021-10-19 | 0 |
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0