STATISTICS NOTES
DESCRIPTIVE STATS
Correlation-
- considered descriptive because is used to determine an existing condition.
- describes in quantitative terms the relationship between two variables (which are also operationally defined).
- Degree of relationship expressed as correlation coefficient; if relationship exists, means scores within a certain range on one measure are associated with scores within certain range on the other.
Uses of Correlation
a.) Relationship studies-
- determining if a relationship exists at all; often used to study
complex variables by finding relationships with related variables (e.g. achievement, success); best contribution is exclusion of unimportant variables.
b.) Prediction studies-
- trying to determine if one variable (predictor) will vary as another does (criterion).
c.) Reliability/Validity studies-
determining how consistent scores derived on a test instrument are under the same conditions, and whether or not the construct measured is the right one.
d.) Precursor-
- may be used to weed out dead-ends before wasting money on
experiments; can also be used to determine relationships to confounding
variables that can then be controlled for in the design.
Correlation Coefficient- Baker, 1994, pg 389 for computation.
- the actual number obtained from the two sets of scores from each subject (always ranges between -1 to 1).
- if used to test hypothesized relationships, will be interpreted in terms of statistical significance.
- if used for prediction, must be around .5 to be useful (because of r2); best to predict group behavior between .6 and .7, and individual behavior at .8+.
- if used for reliability/validity, may be as low as .7, but best over .8.
Concepts Associated With Correlation Coefficient
a.) Linear Regression-
- the line drawn through a scatterplot of scores on two variables
that is as close as possible to them all (best fit line?).
- uses the principle of least squares; the sum of the squared
differences between points on the graph and the line is the
smallest possible.
- r indicates how good the line is for predicting one variable from
another.
b.) Coefficient of Determination
- r must be squared to determine percentage of common variance (r2).
- denotes proportion of variance in one variable (DV), which can be explained by the IV.
c.) Degree of Error-
- the amount of unexplained variance (1 minus r2).
Statistical Significance
- refers to whether an obtained coefficient is really different from zero and reflects a true relationship rather than chance.
- decision is made at a given level of probability; must consult a table to tell how
- large a correlation coefficient must be to be significant at a given probability level; larger coefficient is necessary for significance with smaller samples.
- the greater amount of certainty desired that coefficient is real and not chance, the larger it must be in the table.
p=.05 means that if the coefficient in the table matches what is found, there is a 95% probability that the relationship did not occur by chance.
Means of Computing Correlation- Gay, 1996
a.) Pearson r-
- product moment correlation coefficient.
- most common, as personality instruments and such are usually interval data; most reliable measure of correlation, so used even if others are appropriate.
- used as a test of the null hypothesis that no relationship exists in the population (r=0).
Conditions for Using Pearson r- Baker, 1994
1.) Scale-
- only appropriate when data is interval or ratio data (for purposes of calculating the mean); though occasionally, ordinal data can be used.
2.) Linearity-
- must be assumed that variables are related, r tests for the direction of a relationship.
- the size of r indicates how strongly pattern of variation or change
in one variable is matched by the other, thus it also tests strength of
relationship.
3.) Sample-
- size of must be adequate, generally 30.
b.) Spearman Rho-
- rank difference correlation coefficient, appropriate when data is in ranks; e.g. such as scores arranged from highest to lowest, correlated by socioeconomic measures of income.
- slightly less precise than Pearson r but easier to computer; loses that advantage as number of subjects becomes extremely large.
c.) Dichotomies-
- correlation coefficient utilized when data is in an either/or format (nominal data); e.g. male/female.
- sometimes dichotomies are artificially created by establishing
midpoint and then splitting in two; e.g. all IQ's over 125 are "high" if below 125, are "low."
d.) Curvilinear-
- sometimes variables increase together to a point, and then drift apart; e.g. agility increases with age, and then decreases by it after the 20's.
- use of a linear measure of correlation coefficient will be inaccurate.
e.) Multiple Regression-
- used in prediction studies because combination of variables
makes more accurate prediction than any one variable.
- more variables create some inaccurate predictions, so a
confidence interval is used; high school G.P.A. of 1.2 could
predict college G.P.A. somewhere between .8 and 1.6.
Shrinkage-
- tendency of a prediction to become less accurate when applied to a different group than the one in which it was originally computed; usually cross-validated with another group, and useless variables thrown out.
Return to General Statistics
Return to Intellectual Pursuits