How to calculate the slope in SQL

I have some data in a sql database and I’d like to calculate the slope. The data has this layout:

```Date        |  Keyword  |  Score
2012-01-10  |  taco     |  0.19
2012-01-11  |  taco     |  0.34
2012-01-12  |  taco     |  0.45```

I’d like the final output to look like this by creating a new table using SQL:

```Date        |  Keyword  |  Score |  Slope
2012-01-10  |  ipad     |  0.12  |  0.06
2012-01-11  |  ipad     |  0.17  |  0.06
2012-01-12  |  ipad     |  0.24  |  0.06
2012-01-10  |  taco     |  0.19  |  0.13
2012-01-11  |  taco     |  0.34  |  0.13
2012-01-12  |  taco     |  0.45  |  0.13```

To complicate things, not all Keywords have 3 dates worth of data, some have only 2 for instance.

The simpler the SQL the better since my database is proprietary and I’m not quite sure what formulas are available, although I know it can do OVER(PARTITION BY) if that helps. Thank you!

UPDATE: I define the slope as best fit y=mx+p aka in excel it would be =slope()

Here is another actual example that I usually manipulate in excel:

```date        keyword         score       slope
1/22/2012   water bottle    0.010885442 0.000334784
1/23/2012   water bottle    0.011203949 0.000334784
1/24/2012   water bottle    0.008460835 0.000334784
1/25/2012   water bottle    0.010363991 0.000334784
1/26/2012   water bottle    0.011800716 0.000334784
1/27/2012   water bottle    0.012948411 0.000334784
1/28/2012   water bottle    0.012732459 0.000334784
1/29/2012   water bottle    0.011682568 0.000334784```
Contents

Method 1

The cleanest one I could make:

```SELECT
Scores.Date, Scores.Keyword, Scores.Score,
(N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope
FROM Scores
INNER JOIN (
SELECT
Keyword,
COUNT(*) AS N,
SUM(CAST(Date as float)) AS Sum_X,
SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2,
SUM(Score) AS Sum_Y,
SUM(CAST(Date as float) * Score) AS Sum_XY
FROM Scores
GROUP BY Keyword
) G ON G.Keyword = Scores.Keyword;```

It uses Simple Linear Regression to calculate the slope.

Result:

```Date         Keyword        Score         Slope
2012-01-22   water bottle   0,010885442   0,000334784345222076
2012-01-23   water bottle   0,011203949   0,000334784345222076
2012-01-24   water bottle   0,008460835   0,000334784345222076
2012-01-25   water bottle   0,010363991   0,000334784345222076
2012-01-26   water bottle   0,011800716   0,000334784345222076
2012-01-27   water bottle   0,012948411   0,000334784345222076
2012-01-28   water bottle   0,012732459   0,000334784345222076
2012-01-29   water bottle   0,011682568   0,000334784345222076```

Every database system seems to have a different approach to converting dates to numbers:

• MySQL: `TO_SECONDS(date)` or `TO_DAYS(date)`
• Oracle: `TO_NUMBER(TO_CHAR(date, 'J'))` or `date - TO_DATE('1','yyyy')`
• MS SQL Server: `CAST(date AS float)` (or equivalent `CONVERT`)

Method 2

If you’re defining slope as just the slope from the earliest point to the latest point, and if score only increases with date, then you can get the output above with this:

```SELECT *
FROM scores
JOIN
(SELECT foo.keyword,
(MAX(score)-MIN(score)) / DATEDIFF(MAX(date),MIN(date)) AS score
FROM scores
GROUP BY keyword) a
USING(keyword);```

However if you want linear regression, or if scores can decrease as well as increase with time, you’ll need something more complex.

Method 3

Cast to decimal does not give correct results for me, it is not linear to the dates. Use `TO_DAYS(date_field)` instead, this becomes correct.

Method 4

Use this

`SUM(CONVERT(float, datediff(dd, '1/1/1900', date_field)))`

`SUM(CAST(date_field AS float))`