SQL query for longest streak with a specific condition

I have a table that looks like this: (There is not a record for every day and not for every field_name)

datefield_namefield_value
2021-01-01coffee_available1
2021-01-02coffee_available1
2021-01-03tea_available0
2021-01-03coffee_available1
2021-01-04tea_available0
2021-01-06coffee_available0
2021-01-07coffee_available0
2021-01-08coffee_available1
2021-01-08tea_available1

I want to query the database for specific conditions, eg. the longest streak of days with coffee available or the longest streak of days with no tea available. So the result should be a single number or – even better – the longest streaks with start or end date.

I looked at the other streak questions, but I couldn’t figure out how to change it so it fits my problem. Could you please help me out?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You may use the following approach which uses ROW_NUMBER to determine groups of sequential dates for specific conditions and DATEDIFF to determine the streak of days and that days within each streak differ only by 1. COUNT with an adjustment of 1 may also be used to determine the streak as denoted below. I included an overall query and additional queries specific to the scenario questions shared in the question.

Query #1

SELECT
    MIN(`DATE`),
    MAX(`DATE`),
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) AS day_streak_method_1,
    COUNT(1) - 1 AS day_streak_method_2,
    t1.field_name,
    t1.field_value 
FROM
    (
        SELECT
            DATEDIFF(`DATE`, LAG(`DATE`, 1, `DATE`) OVER (PARTITION BY gnum 
        ORDER BY
            `DATE`)) AS day_diff,
            t.* 
        FROM
            (
                SELECT
                    *,
                    (
                        ROW_NUMBER() OVER (PARTITION BY `field_name` 
                    ORDER BY
                        `DATE`, `field_name`) - ROW_NUMBER() OVER (PARTITION BY `field_name`, `field_value` 
                    ORDER BY
                        `DATE`, `field_name`)
                    )
                    AS gnum 
                FROM
                    conditions
            )
            t
    )
    t1 
WHERE
    day_diff < 2 
GROUP BY
    gnum,
    field_name,
    field_value;
MIN(DATE)MAX(DATE)day_streak_method_1day_streak_method_2field_namefield_value
2021-01-01 00:00:002021-01-03 00:00:0022coffee_available1
2021-01-03 00:00:002021-01-04 00:00:0011tea_available0
2021-01-08 00:00:002021-01-08 00:00:0000coffee_available1
2021-01-08 00:00:002021-01-08 00:00:0000tea_available1
2021-01-06 00:00:002021-01-07 00:00:0011coffee_available0

Query #2

SELECT 'The longest streak of days with coffee available' as `Question 1`;
Question 1
The longest streak of days with coffee available

Query #3

SELECT
    MIN(`DATE`),
    MAX(`DATE`),
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) AS day_streak_method_1
FROM
    (
        SELECT
            DATEDIFF(`DATE`, LAG(`DATE`, 1, `DATE`) OVER (PARTITION BY gnum 
        ORDER BY
            `DATE`)) AS day_diff,
            t.* 
        FROM
            (
                SELECT
                    *,
                    (
                        ROW_NUMBER() OVER (PARTITION BY `field_name` 
                    ORDER BY
                        `DATE`, `field_name`) - ROW_NUMBER() OVER (PARTITION BY `field_name`, `field_value` 
                    ORDER BY
                        `DATE`, `field_name`)
                    )
                    AS gnum 
                FROM
                    conditions
                WHERE 
                    field_name='coffee_available' AND
                    field_value=1
            )
            t
    )
    t1 
WHERE
    day_diff < 2 
GROUP BY
    gnum,
    field_name,
    field_value
ORDER BY 
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) DESC
LIMIT 1;
MIN(DATE)MAX(DATE)day_streak_method_1
2021-01-01 00:00:002021-01-03 00:00:002

Query #4

SELECT 'The longest streak of days with no tea available' as `Question 2`;
Question 2
The longest streak of days with no tea available

Query #5

SELECT
    MIN(`DATE`),
    MAX(`DATE`),
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) AS day_streak_method_1
FROM
    (
        SELECT
            DATEDIFF(`DATE`, LAG(`DATE`, 1, `DATE`) OVER (PARTITION BY gnum 
        ORDER BY
            `DATE`)) AS day_diff,
            t.* 
        FROM
            (
                SELECT
                    *,
                    (
                        ROW_NUMBER() OVER (PARTITION BY `field_name` 
                    ORDER BY
                        `DATE`, `field_name`) - ROW_NUMBER() OVER (PARTITION BY `field_name`, `field_value` 
                    ORDER BY
                        `DATE`, `field_name`)
                    )
                    AS gnum 
                FROM
                    conditions
                WHERE 
                    field_name='tea_available' AND
                    field_value=0
            )
            t
    )
    t1 
WHERE
    day_diff < 2 
GROUP BY
    gnum,
    field_name,
    field_value
ORDER BY 
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) DESC
LIMIT 1;
MIN(DATE)MAX(DATE)day_streak_method_1
2021-01-03 00:00:002021-01-04 00:00:001

View on DB Fiddle

Edit 1

Used less subqueries with the assistance of a case expression. Apply additional filters in where clause where needed.

SELECT
    MIN(`DATE`),
    MAX(`DATE`),
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) AS day_streak_method_1,
    field_name,
    field_value
FROM (
    SELECT
        *,
        CASE 
            WHEN DATEDIFF(
                    `date`,
                    LAG(`date`,1,`date`) OVER (
                        PARTITION BY `field_name`, `field_value`
                        ORDER BY `date`
                    )
                ) < 2 THEN (
                    ROW_NUMBER() OVER (
                        PARTITION BY `field_name` 
                        ORDER BY `date`, `field_name`
                    ) - 
                    ROW_NUMBER() OVER (
                        PARTITION BY `field_name`, `field_value` 
                        ORDER BY `date`, `field_name`
                    )
                )
            ELSE NULL
        END as gnum
    FROM 
        conditions
) t
WHERE 
    gnum IS NOT NULL 
GROUP BY 
    gnum, field_name, field_value

Using older versions of mysql

SELECT
    MIN(`DATE`),
    MAX(`DATE`),
    DATEDIFF(MAX(`DATE`), MIN(`DATE`)) AS day_streak,
    field_name,
    field_value
FROM (
    SELECT
        *,
        @seq1:=IF(@seq1_prev = `field_name`,@seq1+1,1),
        @seq2:=IF(@seq2_prev = CONCAT(`field_name`, `field_value`),@seq2+1,1),
        CASE 
            WHEN DATEDIFF(
                    `date`,
                     IF(
                         @seq2_prev=CONCAT(`field_name`, `field_value`),
                         @date_prev,
                         `date`
                     )
                ) < 2 THEN @seq1 - @seq2
                
            ELSE NULL
        END as gnum,
        
        @seq1_prev:=`field_name`,
        @seq2_prev:=CONCAT(`field_name`, `field_value`),
        @date_prev:=`date`
    FROM 
        conditions
    CROSS JOIN (
        SELECT
            @seq1:=0,
            @seq1_prev:=NULL,
            @seq2:=0,
            @seq2_prev:=NULL,
            @date_prev:=NULL
    ) as vars
    ORDER BY `date`, `field_name`
) t
GROUP BY 
    gnum, field_name, field_value

View working demo on db fiddle

Let me know if this works for you


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x