Three correlation coefficients have been mentioned thus far:r, 2, and r
s. An exten-
sion of rto the case in which there are three or more variables is discussed in Section 6.7. This coefficient is called a multiple correlation coefficient. A fifth coef- ficient, Cramér’s V, that is appropriate for unordered qualitative variables is dis- cussed in Section 17.4. Other coefficients also are available, but they are beyond the scope of this book.
5.9 LOOKING BACK: WHAT HAVE YOU LEARNED?
The term correlationrefers to the association or concomitance between two or more quantitative or ordered qualitative variables. A correlation coefficient is a measure of the degree of association. The presence of an association does not imply causal- ity; it does, however, imply that as one variable changes, the other variable changes. The two most widely used correlation coefficients in the behavioral sciences and education are the Pearson product-moment correlation coefficient,r, and the Spear- man rank correlation coefficient,rs. Pearson’s rreflects the strength and the direc- tion of the linear relationship between two quantitative variables. It is a number that varies between 1 and 1, with 0 indicating the absence of a linear relationship. Negative values indicate an inverse relationship between the variables; positive val- ues indicate a positive or direct relationship. Spearman’s rsmeasures the strength and the direction of the monotonic relationship between two ordered qualitative variables—that is, ranked data. It, like r, varies between 1 and 1, with 0 indicating the absence of a monotonic relationship.
Two statistics, both functions of r, are useful in interpreting a particular rvalue: the coefficient of determination,r2, and the coefficient of nondetermination,k21 – r2.
For a given linear relationship between Xand Y,r2reflects the proportion of the X-
score variance that can be explained by the Y-score variance and vice versa; k2reflects
the proportion that cannot be explained. If, for example,r is equal to .50, you know that, based on the linear relationship between the variables, 25% of the variance of one variable can be explained by the variance of the other variable, and 75% remains to be explained.
The Pearson product-moment correlation coefficient is appropriate for linearly re- lated quantitative variables. For descriptive purposes, no other assumptions regarding the variables are required. However, in interpreting r, keep in mind that the size of r
can be affected by such factors as the shape of the Xand Ydistributions, the presence of a truncated Xor Yrange, the presence of subgroups with standard deviations or means that differ for both variables, and the presence of a discontinuous distribution for Xor Yor both.
REVIEW EXERCISES FOR CHAPTER 5
1. A job-satisfaction questionnaire was administered to a random sample of 36 men between the ages of 29 and 34. The researcher was interested in the relationship between number of years of formal education and job satisfaction. (a) Construct
a scatterplot for the data in the following table. (b) Does the relationship appear to be linear or nonlinear?
Years of Job Years of Job
Participant Education Satisfaction Participant Education Satisfaction
1 14 36 19 12 43 2 11 38 20 11 46 3 10 36 21 18 53 4 15 51 22 8 30 5 7 30 23 9 35 6 8 37 24 12 40 7 12 40 25 13 40 8 13 43 26 13 41 9 16 47 27 10 32 10 12 44 28 14 50 11 12 37 29 12 33 12 11 40 30 14 47 13 9 32 31 10 38 14 12 42 32 11 37 15 13 45 33 12 40 16 11 38 34 14 50 17 12 42 35 13 42 18 11 37 36 13 45
2. Distinguish between rand .
3. Match the rvalues 1,1, 0, .3, and .8 with the scatterplots shown here.
X a. Y X b. Y X d. Y X e. Y X c. Y
5.9 Looking Back: What Have You Learned?
153
4. Would you expect the correlation between the following to be positive, negative, or essentially zero?
a. Mechanical aptitude and birth order
b. Verbal intelligence and number of trials to learn a list of nonsense syllables
c. Grades in college and annual income 10 years after graduation
d. Number of letters in last name and musical aptitude
5. The Alcohol Dependence Scale was developed to assist the World Health Orga- nization in the classification of alcoholism. Fifteen alcoholics seeking counsel- ing for alcohol-related disabilities took this scale and the Michigan Alcoholism Screening Test, which yields an index of problems related to drinking. The in- vestigators obtained the following data. (Suggested by Skinner, Harvey A., and Allen, Barbara A. [1982]. Alcohol dependence syndrome: Measurement and validation. Journal of Abnormal Psychology,91, 199–209.)
Alcohol Michigan Alcoholism Counselee Dependence Scale Screening Test
1 89 78 2 48 57 3 74 65 4 97 86 5 59 58 6 65 75 7 46 57 8 84 95 9 78 69 10 77 86 11 67 78 12 36 47 13 83 74 14 68 77 15 96 87
a. Construct a scatterplot for these data and decide whether the data appear to be linearly related.
b. Use the deviation formula or a calculator to compute rfor these data. 6. Researchers have reported that lonely people often describe themselves as
shy. To investigate the strength of the relationship between the two variables, investigators gave a modified version of the Stanford Shyness Survey and the UCLA Loneliness Scale to 20 male and 20 female college students. The order of administration of the instruments was randomized independently for each student. The researchers obtained the following data for the male stu- dents. (Experiment suggested by Maroldo, Georgetter K. [1981]. Shyness and loneliness among college men and women. Psychological Reports,48, 885–886.)
Stanford UCLA Stanford UCLA
Shyness Loneliness Shyness Loneliness
Student Survey Scale Student Survey Scale
1 36 51 11 30 29 2 39 52 12 30 40 3 30 33 13 33 45 4 23 35 14 32 30 5 28 55 15 28 42 6 41 52 16 34 45 7 29 32 17 21 35 8 27 38 18 41 35 9 28 40 19 23 30 10 28 33 20 39 51
a. Construct a scatterplot for these data and decide whether the data appear to be linearly related.
b. Use the deviation formula or a calculator to compute rfor these data. 7. Use the deviation formula or a calculator to compute rfor the education and job
data in Exercise 1.
8. Calculate for the following data. In which quadrants of Figure 5.3-1 would the majority of the data points fall? Are the variables lin- early related, and if so, is the relationship positive or negative?
a. b. c. d. X Y X Y X Y X Y 14 18 10 17 9 17 9 17 6 11 10 15 11 14 11 17 10 15 12 15 13 10 13 13 10 16 8 13 7 19 7 13
9. For the data in Exercise 8, make figures like Figure 5.3-1.
10. For the data in Exercise 8, calculate the Pearson product-moment correlation coefficient using the deviation formula or a calculator.
11. What does covariance,SXY, tell you about the relationship between Xand Y? In computing r, why is SXYdivided by SXSY?
12. For a set of data with SX4 and SY5, what is the largest possible value that
SXYcan be? (Hint:The maximum value of r 1 and r SXY/SXSY.)
13. The correlation coefficient for the following data is undefined. Why is this state- ment true? X Y 13 16 16 16 11 16 17 16 12 16 gn i51sXi2Xd sYi2Yd
5.9 Looking Back: What Have You Learned?
155
14. What do r2and k2tell you about the relationship between Xand Y?
15. For the following experiments, compute r2and k2and interpret them verbally
and by means of diagrams like those in Figure 5.34-1.
a. The correlation between grades in introductory psychology and introductory statistics was .32.
b. The correlation between the number of hours that rats had been deprived of food and the time to traverse a maze with sunflower seeds in the goal box was .80.
c. The correlation between the last two digits of students’ Social Security num- bers and the number of trials to learn nonsense syllables was .02.
16. Which of the following are incorrect interpretations of a correlation coefficient, and why?
a. The strength of association between scores on the Attitudes Toward Disabled Persons Scale and amount of exposure to persons with disabilities is .56. b. The correlation between height and weight at age 6 is .40; this correlation is
twice as high as that at age 16, when r .20.
c. The correlation between reaction time and number of automobile accidents is .20; 96% of the variance in frequency of accidents is unaccounted for. d. You can conclude from the high correlation between level of motivation and
number of elective offices sought that office-seeking behavior is caused at least in part by motivation.
17. What is wrong with interpreting r
a. in direct proportion to its size?
b. in terms of arbitrary descriptive labels? c. as indicating causality?
18. Employees with the highest accident rates were required to complete a safety course. Following the course, the employees had fewer accidents. Can you con- clude that the course was effective? What controls could be used in the experi- ment to make the outcome easier to interpret?
19. What effects do the following factors have on ras a measure of strength of as- sociation? Draw figures like Figures 5.6-1 through 5.6-5 to represent the data. a. The relationship between Xand Ylooks like a U. Assume that ris positive. b. The range of Xis reduced by deleting participants with scores below . c. The sample contains subgroups aand bwith equal standard deviations and
means a16, b22, a42, and b31. Assume that ris positive for both aand b.
d. The sample contains subgroups a andb with equal standard deviations and means a20, b26, a35, and b41. Assume that ris positive for both aand b.
e. The sample contains subgroups a and b with equal means and standard deviations 15, 24, 24, and 15. Assume that ris positive for both aand b.
f. The sample contains subgroups aand bwith equal means and standard devi- ations 18, 18, 26, and 34. Assume that ris posi- tive for both aand b.
SYb SYa SXb SXa SYb SYa SXb SXa Y Y X X Y Y X X X
g. The distribution of the Xvariable is positively skewed; that for the Yvariable is negatively skewed. Assume that ris positive.
h. The distributions of the Xand Yvariables are negatively skewed. 20. How can you detect cases in which should be used instead of r?
21. What are the potential advantages and disadvantages of using extreme groups in research?
22. The correlation between IQ and grade-point average (GPA) for high school se- niors was .63. For seniors who went on to college, the correlation between IQ and college GPA was .51. Explain why this correlation is lower.
23. List the similarities and differences between rand rs.
24. A psychiatric social worker and an occupational therapist ranked 11 Veterans Administration patients with respect to extent of recovery following 3 months of therapy. Compute the Spearman rank correlation between the two sets of rankings.
Patient Social Worker Occupational Therapist
1 7 7 2 2 1 3 1 2 4 3 5 5 8 9 6 10 10 7 4 3 8 9 8 9 11 11 10 6 6 11 5 4
25. Participants rated the attractiveness of one set of geometric shapes before smok- ing marijuana and a similar set after smoking marijuana. One shape in the two sets was the same. The following data are the ratings for that shape. A rating of 1 means very attractive; a rating of 20 means very unattractive. Transform the ratings to ranks, and compute the Spearman rank correlation between the two sets of ranks.
Participant Smoking After Smoking
1 6 3 2 8 7 3 14 16 4 7 2 5 10 12 6 9 15 7 5 1 8 15 20 9 12 17 h2
5.9 Looking Back: What Have You Learned?
157
26. Suppose that for the data in Exercise 25, participant 6 had assigned a rating of 12 instead of 15 to the geometric shape after smoking marijuana. This rating re- sults in tied ranks. How would this affect the computational procedure for the correlation coefficient?
27. Which of the following are strictly monotonic functions? a. Y 3 3X b. Y 1 X2
c. Y X3 d. Y 1/X
28. Use a statistical software package to compute the Pearson product-moment cor- relation between for the variables of number of years of formal education and job satisfaction in Exercise 1.
29. Use a statistical software package to compute the Pearson product-moment cor- relation between the Alcohol Dependence Scale data and the Michigan Alco- holism Screening Test data in Exercise 5.
30. Use a statistical software package to compute the Pearson product-moment cor- relation between the modified version of the Stanford Shyness Survey data and the UCLA Loneliness Scale data in Exercise 6.