A quote from Dr. Napoleon Hill.
“Whatever the mind
can conceive and believe, the mind can achieve.”
Correlation is a measure of relationship or association between two variables say x, y.
There are two
types, one is positive and the other is negative. If x increases y also
increases, then it is positive or if x decreases, y also decreases this is also
positive (+ve). There may be situation
when x increases y decreases or when x decreases y increases, in that case
correlation is negative (-ve). There are some instances there are no
correlation.
Examples: Positive
1. One liter petrol costs Rs.102, then the
cost of 4 liters.
2.Cost of bus/ train ticket; If the travel
distance is more, cost also goes up.
3.Consumption of electricity, if one uses
longer time, the consumption shall be more, subsequently bill amount will be
more.
Eg; Negative
1.If you increase the speed while
driving car (could be even scooter or any vehicle for the matter) the travel
time will be less.
2.In a circuit if the resistance R is
more, the flow of current shall be less.
Francis Galton,
during the year 1888, has found out/ introduced this topic in the field of
mathematics. This is being used in the field of psychology and education,
correlation is used, in business financial analysis and decision making, in
Statistics analysis of variates, in research, scientists deal with data
analysis, in order to reduce the mistakes.
There 4 types of correlation.
1. Pearson correlation:- It is a correlation coefficient that measures linear correlation between two sets of data.
2. Kendall rank correlation : It is used to measure the ordinal association between two measured quantities.
Spearman’s Correlation formula
4. Point- Biserial Correlation: The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable (e.g. Y) is dichotomous. To calculate rpb, assume that the dichotomous variable Y has the two values 0 and 1. If we divide the data set into two groups, group 1 which received the value "1" on Y and group 2 which received the value "0" on Y, then the point-biserial correlation coefficient is calculated as follows:
where sn is the standard deviation used when data are available for every member of the population:
Let me recall what had been learnt in earlier
classes. In chapter measures of central tendency, we studied mean, median and
mode. These averages give us only a rough idea where the observations centered.
We will not get clear idea to what extent observations are scattered or
arranged. So, we go further to study about the data, call them measures of
dispersion .ie 1. range,2. mean deviation about mean, median,.3, Harmonic Mean,
4. Geometric mean,5. Standard deviation (SD), 6. variance.
These measures of dispersion
will be able to measure the degree of how the observations are scattered. We
use the Standard deviation for our analysis.
Variance: Given a
set of numbers. Variance is a measure of, how far each number is from the mean.
It is calculated by taking the difference between each number from the mean.
The differences are squared. It is further
dividing the sum of squares by the number of values in the given set.
Variance is used in statistical inferences,
hypothesis testing. It is also used in investment portfolios, to know, improve
investment.
For all types formulae are given, from the given data one has to form the tables, then substitute, in the formulae. It’s a matter of 4 fundamental operations, squares and square roots. One has to remember the formula. let us work out an example.
Before we move to
the problems there is another word connected with this topic is Covariance.
Covariance is a
measure of directional relationship between the two variables to what extent
the variables change together. Correlation coefficient is a mere number lying
between -1 and 1 whereas Covariance is measured in units. Variance is a measure
of magnitude; it is a number. Variance could be positive (+) or negative (-)
Formula for covariance: Cov (X, Y) = ∑ { (Xi - `X) ( Yj -`Y)}/ n, where Xi denotes the values of
variable X, Yj denotes the values of variable Y, `X the mean of variable X,`Y mean of variable Y, n, the number of data
entries /units.
Variance is square
of` Standard Deviation (SD), denoted by s^2 (sigma square). One has to be familiar with the symbols what do we
use in
mathematics
universally accepted ones.
Correlation
coefficient r(X, Y) = Cov (X, Y) / sx sy
where r(X, Y) correlation between X and Y, Cov (X, Y) covariance
between X and Y, sx
Standard deviation of X and sy
standard deviation of Y.
Note: The
correlation coefficient always lies
between -1 and 1. If it is 0
(zero) we can clearly say that there is no correlation between `the two given variables,
Covariance: Examples.
1.Find the Cov (X,
Y) between the two variables X and Y:
Given if
X: 3
4 5 6 7;
Y: 8 7 6
5 4. From the given data ∑XY
=140,
∑ X =25, ∑Y =30.
n=5
(∑XY =24+ 28+ 30+ 30+28). Now the Solution is Cov({X, Y) ={ n∑ XY - (∑X)(
∑Y) }/ `n^2. Substituting
the values, we
get 5×140 - (25×30) / 25 = 700 – 750 /
25 = - 50/ 25 = -2 we can conclude the variables are negatively correlated.
2.Find the
correlation coefficient for the data given. Cov (X, Y) = -16.5, Var (X) =2.89, Var(Y)
=100,
r(X, Y) = Cov(X, Y) /
Övar(x).Var(Y).=
-16.5 / Sq root 0f (2.89
×100) = -16.5 / (1,7 ×10) = -16.5/ 17 ,
Correlation coefficient
is calculated to be - 0.97, (negative), is the answer.
3. ∑ X = 15, . ∑ Y
= 36,. ∑ XY=110, n =5, find Cov (X, Y)
Ans: Cov (X,
Y) = (1/ n). ∑ XiYi -
(1/ 5) X 110 ---3x 7.2 = 22 ---- 21.6 = 0.4 Answer.
4. Cov ( x,y) = --
13.5 , Var (X) = 2.25 , Var (Y ) = 100 ,find correlation coefficient . r ( x,y)
?
Ans: r( x,y) = Cov
(X, Y ) / Ö Var X . Ö Var Y.
= --- 13.5 / 1.5 x 10 = -- 13,5 / 15 = - 0.9 Answer (negative).
--------- be continued
No comments:
Post a Comment