Discussion:
calculating correlation coefficient
(too old to reply)
csmba
2008-02-15 22:16:52 UTC
Permalink
The end game is that I want a report to show me the correlation (number
between -1 to 1) between each 2 items in my transactional database.

1. Is there a way to have an SP or function that calculates correlation for
2 clms in a table?
2. is there a better way, so that I don't need to create fake tables (cause
I want correlation between lets say x,y,z and a,b,c, so that is:
x-a,x-b,x-c,y-a....)
3. would you tell me that the only way is to use Analysis Services?

What would be the recommendation that will let me at the end, deliver such
report?

thanks
Dejan Sarka
2008-02-18 06:51:29 UTC
Permalink
Hi!

It is quite simple to calculate covariance, correlation coefficient and
coefficient of determination (squared correlation) with T-SQL queries. Here
is an example using vTargetMail view from AdventureWorksDW demo db:

/* Linear relationship */
USE AdventureWorksDW;
GO
-- Covariance
DECLARE @mean1 decimal(20,6)
DECLARE @mean2 decimal(20,6)
SELECT @mean1=AVG(YearlyIncome*1.0)
,@mean2=AVG(NumberCarsOwned*1.0)
FROM vTargetMail
SELECT CoVar=
SUM((YearlyIncome*1.0-@mean1)*(NumberCarsOwned*1.0-@mean2))/COUNT(*)
FROM dbo.vTargetMail
GO
-- Correlation and CD
DECLARE @mean1 decimal(20,6)
DECLARE @mean2 decimal(20,6)
SELECT @mean1=AVG(YearlyIncome*1.0)
,@mean2=AVG(NumberCarsOwned*1.0)
FROM vTargetMail
SELECT Correl=
(SUM((YearlyIncome*1.0-@mean1)*(NumberCarsOwned*1.0-@mean2))/COUNT(*))
/((STDEVP(YearlyIncome*1.0)*STDEVP(NumberCarsOwned*1.0))),
CD=SQUARE(
(SUM((YearlyIncome*1.0-@mean1)*(NumberCarsOwned*1.0-@mean2))/COUNT(*))
/((STDEVP(YearlyIncome*1.0)*STDEVP(NumberCarsOwned*1.0))))
FROM vTargetMail
GO
--
Dejan Sarka
http://blogs.solidq.com/EN/dsarka/default.aspx
Post by csmba
The end game is that I want a report to show me the correlation (number
between -1 to 1) between each 2 items in my transactional database.
1. Is there a way to have an SP or function that calculates correlation
for 2 clms in a table?
2. is there a better way, so that I don't need to create fake tables
x-a,x-b,x-c,y-a....)
3. would you tell me that the only way is to use Analysis Services?
What would be the recommendation that will let me at the end, deliver such
report?
thanks
Loading...