STATISTICS NOTES
DESCRIPTIVE STATS

Correlation-

Uses of Correlation

a.) Relationship studies-

  • determining if a relationship exists at all; often used to study complex variables by finding relationships with related variables (e.g. achievement, success); best contribution is exclusion of unimportant variables.

b.) Prediction studies-

  • trying to determine if one variable (predictor) will vary as another does (criterion).

c.) Reliability/Validity studies-

  • determining how consistent scores derived on a test instrument are under the same conditions, and whether or not the construct measured is the right one.

    d.) Precursor-

    • may be used to weed out dead-ends before wasting money on experiments; can also be used to determine relationships to confounding variables that can then be controlled for in the design.

    Correlation Coefficient- Baker, 1994, pg 389 for computation.

    • the actual number obtained from the two sets of scores from each subject (always ranges between -1 to 1).
    • if used to test hypothesized relationships, will be interpreted in terms of statistical significance.
    • if used for prediction, must be around .5 to be useful (because of r2); best to predict group behavior between .6 and .7, and individual behavior at .8+.
    • if used for reliability/validity, may be as low as .7, but best over .8.

    Concepts Associated With Correlation Coefficient

    a.) Linear Regression-

    • the line drawn through a scatterplot of scores on two variables that is as close as possible to them all (best fit line?).
    • uses the principle of least squares; the sum of the squared differences between points on the graph and the line is the smallest possible.
    • r indicates how good the line is for predicting one variable from another.

    b.) Coefficient of Determination

    • r must be squared to determine percentage of common variance (r2).
    • denotes proportion of variance in one variable (DV), which can be explained by the IV.

    c.) Degree of Error-

    • the amount of unexplained variance (1 minus r2).

    Statistical Significance

    • refers to whether an obtained coefficient is really different from zero and reflects a true relationship rather than chance.
    • decision is made at a given level of probability; must consult a table to tell how
    • large a correlation coefficient must be to be significant at a given probability level; larger coefficient is necessary for significance with smaller samples.
    • the greater amount of certainty desired that coefficient is real and not chance, the larger it must be in the table.

    p=.05 means that if the coefficient in the table matches what is found, there is a 95% probability that the relationship did not occur by chance.

    Means of Computing Correlation- Gay, 1996

    a.) Pearson r-

    • product moment correlation coefficient.
    • most common, as personality instruments and such are usually interval data; most reliable measure of correlation, so used even if others are appropriate.
    • used as a test of the null hypothesis that no relationship exists in the population (r=0).

        Conditions for Using Pearson r- Baker, 1994

        1.) Scale-

        • only appropriate when data is interval or ratio data (for purposes of calculating the mean); though occasionally, ordinal data can be used.

        2.) Linearity-

        • must be assumed that variables are related, r tests for the direction of a relationship.
        • the size of r indicates how strongly pattern of variation or change in one variable is matched by the other, thus it also tests strength of relationship.

        3.) Sample-

        • size of must be adequate, generally 30.

      b.) Spearman Rho-

      • rank difference correlation coefficient, appropriate when data is in ranks; e.g. such as scores arranged from highest to lowest, correlated by socioeconomic measures of income.
      • slightly less precise than Pearson r but easier to computer; loses that advantage as number of subjects becomes extremely large.

      c.) Dichotomies-

      • correlation coefficient utilized when data is in an either/or format (nominal data); e.g. male/female.
      • sometimes dichotomies are artificially created by establishing midpoint and then splitting in two; e.g. all IQ's over 125 are "high" if below 125, are "low."

      d.) Curvilinear-

      • sometimes variables increase together to a point, and then drift apart; e.g. agility increases with age, and then decreases by it after the 20's.
      • use of a linear measure of correlation coefficient will be inaccurate.

      e.) Multiple Regression-

      • used in prediction studies because combination of variables makes more accurate prediction than any one variable.
      • more variables create some inaccurate predictions, so a confidence interval is used; high school G.P.A. of 1.2 could predict college G.P.A. somewhere between .8 and 1.6.

      Shrinkage-

      • tendency of a prediction to become less accurate when applied to a different group than the one in which it was originally computed; usually cross-validated with another group, and useless variables thrown out.


      Return to General Statistics

      Return to Intellectual Pursuits