How to calculate the slope in SQL

I have some data in a sql database and I’d like to calculate the slope. The data has this layout:

Date        |  Keyword  |  Score    
2012-01-10  |  ipad     |  0.12    
2012-01-11  |  ipad     |  0.17    
2012-01-12  |  ipad     |  0.24    
2012-01-10  |  taco     |  0.19    
2012-01-11  |  taco     |  0.34    
2012-01-12  |  taco     |  0.45

I’d like the final output to look like this by creating a new table using SQL:

Date        |  Keyword  |  Score |  Slope    
2012-01-10  |  ipad     |  0.12  |  0.06    
2012-01-11  |  ipad     |  0.17  |  0.06    
2012-01-12  |  ipad     |  0.24  |  0.06    
2012-01-10  |  taco     |  0.19  |  0.13    
2012-01-11  |  taco     |  0.34  |  0.13    
2012-01-12  |  taco     |  0.45  |  0.13

To complicate things, not all Keywords have 3 dates worth of data, some have only 2 for instance.

The simpler the SQL the better since my database is proprietary and I’m not quite sure what formulas are available, although I know it can do OVER(PARTITION BY) if that helps. Thank you!

UPDATE: I define the slope as best fit y=mx+p aka in excel it would be =slope()

Here is another actual example that I usually manipulate in excel:

date        keyword         score       slope   
1/22/2012   water bottle    0.010885442 0.000334784  
1/23/2012   water bottle    0.011203949 0.000334784  
1/24/2012   water bottle    0.008460835 0.000334784  
1/25/2012   water bottle    0.010363991 0.000334784  
1/26/2012   water bottle    0.011800716 0.000334784  
1/27/2012   water bottle    0.012948411 0.000334784  
1/28/2012   water bottle    0.012732459 0.000334784  
1/29/2012   water bottle    0.011682568 0.000334784

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The cleanest one I could make:

SELECT
    Scores.Date, Scores.Keyword, Scores.Score,
    (N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope
FROM Scores
INNER JOIN (
    SELECT
        Keyword,
        COUNT(*) AS N,
        SUM(CAST(Date as float)) AS Sum_X,
        SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2,
        SUM(Score) AS Sum_Y,
        SUM(CAST(Date as float) * Score) AS Sum_XY
    FROM Scores
    GROUP BY Keyword
) G ON G.Keyword = Scores.Keyword;

It uses Simple Linear Regression to calculate the slope.

Result:

Date         Keyword        Score         Slope
2012-01-22   water bottle   0,010885442   0,000334784345222076
2012-01-23   water bottle   0,011203949   0,000334784345222076
2012-01-24   water bottle   0,008460835   0,000334784345222076
2012-01-25   water bottle   0,010363991   0,000334784345222076
2012-01-26   water bottle   0,011800716   0,000334784345222076
2012-01-27   water bottle   0,012948411   0,000334784345222076
2012-01-28   water bottle   0,012732459   0,000334784345222076
2012-01-29   water bottle   0,011682568   0,000334784345222076

Every database system seems to have a different approach to converting dates to numbers:

  • MySQL: TO_SECONDS(date) or TO_DAYS(date)
  • Oracle: TO_NUMBER(TO_CHAR(date, 'J')) or date - TO_DATE('1','yyyy')
  • MS SQL Server: CAST(date AS float) (or equivalent CONVERT)

Method 2

If you’re defining slope as just the slope from the earliest point to the latest point, and if score only increases with date, then you can get the output above with this:

SELECT *
  FROM scores
  JOIN
    (SELECT foo.keyword,
            (MAX(score)-MIN(score)) / DATEDIFF(MAX(date),MIN(date)) AS score
     FROM scores
     GROUP BY keyword) a
  USING(keyword);

However if you want linear regression, or if scores can decrease as well as increase with time, you’ll need something more complex.

Method 3

Cast to decimal does not give correct results for me, it is not linear to the dates. Use TO_DAYS(date_field) instead, this becomes correct.

Method 4

Use this

SUM(CONVERT(float, datediff(dd, '1/1/1900', date_field)))

instead of

SUM(CAST(date_field AS float))


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x