Introduction
Relationship between education spending and test scores
The correlation is negative (-0.2). The United States spends in education the second most of any country, and has below average test-scores. Ethnically homogeneous Japan, South Korea and Finland spend at average rates and have the best test scores. Tiny, ethnically homogeneous and "hungry" Estonia spends less than half as much as the United States and Norway on education but has far better test scores. Source: Economy Industry USA View
The Organization for Economic Co-operation and Development
(OECD)
released the results of it s 2009 global rankings
on student performance in mathematics, reading, and science, on
the Program for
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Relationship between per pupil spending and mean math scores in PISA 2012, by country
The figure shows the simple correlation between the mean scores in mathematics and the expenditure per pupil in secondary education for each of the countries that participated in PISA 2012. It is easy to see that students in countries like Qatar and Singapore spend similar amounts of Dollars per Student, achieving very different PISA math scores.
The Organization for Economic
Co-operation and Development (OECD)
released the results o f its 2012 global ranki ngs
on student performance in mathematics,
reading, and science, on the Program for International Student Assessment,
Ranking of top countries in math, reading, and science is out — and the US didn't crack the top 10
Source: OECD. China is represented by the
provinces of Beijing, Shanghai, Jiangsu, and Guangdong.
The PISA is a worldwide exam administered every three years that measures 15-year-olds
in 72 countries.
About 540,000 students took the exam in 2015..
Asian countries topped the rankings across all
PISA tests: Singapore top in global education rankings/2015
"If you think maths is a hard subject you won't succeed," 10-year-old Hai Yang tells me. striking feature of Singapore's education:
*The whole class has just been working on a problem, taking it in turns to stand up and explain how they worked it out. And they do this in English, one of several languages spoken in Singapore. It turns out there is more than one way to reach the right solution.
*What is impressive is their commitment to understanding exactly how to do it.
"If we just blindly look at the teacher's answer, when we grow up we might not know how to do it any more“ *Building blocks
This is an approach known as maths mastery which some schools in the UK have begun using in an adapted form.
*"We believe in Singapore in the fundamentals, that in order for a child to be well educated you need to give them the fundamental language and grammar in various disciplines, a language where you can read, a language where you can understand numbers." S
=ingapore has also thought a lot about how to make teaching a rewarding profession.
*Teachers can follow a career path that takes them towards being a principal, a researcher into education or a master classroom teacher. They get time to deepen their knowledge and prepare lessons.
*In Montfort Secondary school they are encouraging the teenage boys to make prototype products, ranging from a smart garden watering system to an electronic keyboard.
*Using your science and maths skills to solve real world problems is exactly the kind of ability the PISA tests are intended to measure. An empty room at the school is being turned into what they call a "makers lab". *Simple tools and materials will be available for the pupils to use in their spare time to make things to take home. If they want to work out how to light up their guitar with LED lights, this is where they can do it.
*Another striking feature of Singapore's education is that head teachers are rotated between schools every six to eight years. There is also an increasing emphasis on collaboration.
*"Today teachers work in teams, they grow together, they research together, they work together." High stakes
The Objective of Correlation and Regression
The objective for correlation is to establish the relationship between two or more quantitative variables without being able to infer causal relationships, and for
regression analysis is to establish a mathematical model to estimate the value of a variable based on the value of the other variables. This technique is appropriate when:
A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other.
Logistic regression analysis is used to examine relationships between variables when the dependent variable is nominal, even though independent variables are nominal, ordinal, interval, or some mixture thereof.
Suppose that one wanted to determine which program interventions were associated with a JOBS Program client's ability to get a job within six months of exiting the program. The outcome variable would be "job" or "no job” clearly a nominal variable. One could then use several independent variables such as job training, post-secondary education and the like to predict the odds of getting a job.
Multiple Regression Analysis Technique this technique is appropriate
Methodology
To perform a regression analysis and correlation is advisable to follow the following steps:
1. Collecting data from sources such as questionnaires, forms or databases, texts, brochures, magazines, internet, direct measurements, etc.
2. Draw the scatter diagram, which suggests that model could be used, is a graph showing the intensity and direction of the relationship between two variables. Only up to three-dimensional planes are best seen models suggested. This question is important: Does the relationship appear to be linear or curved? 3. Calculate the values of the correlation coefficient and the coefficient of determination (note: correlation coefficient measures the percentage of linear association between variables and coefficients of determination measures the percentage of variability of the dependent variable explained by the independent variable).
4. Set the model suggests the scatter diagram or suggested by the experience of the investigator.
5. Estimate the regression line using a processing program with statistical applications (Excel, SPSS, Statgraphics, Minitab, SAS, Statistics, etc.) or by formulas.
Techniques for Examining Associations
Spearman Correlation
The technique is appropriate
when:
The degree of association
between two sets of ranks
(pertaining to two variables) is
to be examined.
Illustrative research question(s)
this technique can answer
“Is
there a significant relationship
between motivation levels of
teachers and the quality of
their performance?“
Assume that the data on motivation and quality of performance are in the form of ranks, say, 1 through 50, for 50 teachers who were evaluated
subjectively by their administrators on each variable.
Pearson Correlation
This technique is appropriate
When:
The degree of association
between two metric-scaled
(interval or ratio) variables is to
be examined.
Illustrative research
question(s) this technique
can answer
“
Is there a
significant relationship between
parents' age (measured in
actual years) and their
perceptions of the school's
Spearman Rank Coefficient (r
s
)
• Used for non-linear relationships
• It is a non-parametric measure of correlation.
• This procedure makes use of the two sets of ranks that
may be assigned to the sample values of x and Y.
• Spearman Rank correlation coefficient could be
computed in the following cases:
Both variables are quantitative.
Both variables are qualitative ordinal.
One variable is quantitative and the other is
qualitative ordinal.
Spearman Correlation
Example: Quality of life
Fourteen cities have been rated on an index that measures the quality of life.
Also, the percentage of the population that has moved into each city over the
past year has been determined. Have cities with higher quality of life scores
attracted more new residents?
Association between quality of life and percentage of new residents
City
Quality of life
Percentage of New Residents
A 25 5
B 10 4
C 15 3
D 30 6
E 20 3
F 25 9
G 10 5
H 15 3
I 30 7
J 20 8
K 15 5
L 17 6
M 20 7
Steps in SPSS for Spearman correlation
OUTPUT DATA – Spearman correlation
Correlations
Quality of Life
Percentage of New Residents Spearman's
rho
Quality of LifeCorrelation
Coefficient 1.000 .586* Sig. (2-tailed) .028
N 14 14
Percentage of New
Residents
Correlation
Coefficient .586* 1.000 Sig. (2-tailed) .028
N 14 14
*. Correlation is significant at the 0.05 level (2-tailed).
Simple Correlation (r) Pearson
It is also called Pearson's correlation or product moment correlation coefficient. It measures the direction (the sign denotes the direction) and strength (the value of
r denotes the strength of association) between two variables of the quantitative variables.
Direct or positive, if the values of the two variables deviate in the same direction i.e. if an increase (or decrease) in the values of one variable results, on average, in a corresponding increase (or decrease) in the values of the other variable the correlation is said to be direct or positive. Examples:
•Student’s performance and number of hours studied •Satisfaction and loyalty at work.
Inverse or negative, if the variables deviate in opposite direction i.e. if increase in the values of one variable results on average, in corresponding decrease in the values of other variable. Examples:
•TV viewing and class grades-students who spend more time watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less time watching TV)
Pearson Correlation
-1
-0.75 -0.250
0.25 0.751
strong
moderate
weak
weak
moderate
strong
no relation Inverse perfect
correlation
Direct
inverse
Direct perfect correlation
The value of “r” ranges between ( -1) and ( +1)
The value of “r” denotes the strength of the association
as illustrated by the following diagram. If r = 0 or close to
Zero this means no association or correlation between
the two variables.
Example
A sample of 12 students was selected, data about their performance and
the time that usually wake-up was recorded as shown in the following
table . It is required to find the correlation between performance and the
time that student usually wakes up.
Student
Wake-up
Time
Academic
Performance
Kalisa
5.30
13.0
Seraphine
10.00
9.0
Manasse
8.00
13.0
Odette
9.00
11.0
Laurence
6.00
16.0
Pascal
7.00
10.0
Gallican
7.30
13.0
Marcel
6.00
11.0
Sandrine
5.00
14.0
Acqueline
9.30
10.0
Judith
5.30
16.5
Innoncent
7.30
12.0
Hypothesis
Ho: ρ = 0 (there is no association between performance and the time that usually wake-up)
Ha:
There is an association between them
0
Steps in SPSS
Again to perform a correlation and regression analysis is advisable to
follow the following steps:
Step 1: Scatter Diagram (
After collecting the data, draw the scatter
diagram)
The starting point is to draw a scatter of points on a graph, with one
variable on the X-axis and the other variable on the Y-axis; it is
customary represent the dependent variable on the vertical axis and
independent on the horizontal axis. When studying the relationship
between two variables, one can be considered as cause and the other
as a result or effect of the other. Call the exogenous or independent
variable that causes, the effect is the endogenous variable. The scatter
plots or diagrams give an idea of the relationship (if any) between the
variables as suggested by the data. The closer the points of a straight
line are, the stronger the linear relationship between two variables will
be.
Steps and Output of scatter dot
Step 2. Correlation
OUTPUT - Correlation
Correlations
Wake up-Time
Academic performance Wake up-Time Pearson
Correlation 1 -.720** Sig.
(2-tailed) .008
N 12 12
Academic performance
Pearson
Correlation -.720** 1 Sig.
(2-tailed) .008
N 12 12
**. Correlation is significant at the 0.01 level (2-tailed).
These variables have a strong inverse association (r = -.720). The wake-up time is relate with the academic performance. Sig.=.008, means there is a strong inverse relationship between the time that students wake-up and their performance (the meaning is, later get up less score)
Coefficient of determination is the percentage of variation in the dependent variable ‘Y’ explained by the independent variable ‘X’.
How well does this line fit the data?
The value of r2 =(-.720)2=0.5184, 51.84 ≈ 52%
The 'goodness of fit' indicates the percentage of the variation in performance which is accounted for by the variation of the wake-up time; in other hands 52% of the variance in performance is explained by the time that students wake up.
Example
Country % Immunization Mortality_rate
Bolivia 77 118
Brasil 69 65
Cambodia 32 184
Canadá 85 8
China 94 43
Czech_Republic 99 12
Egypt 89 55
Ethiopia 13 208
Finland 95 7
France 95 9
Greece 54 9
India 89 124
Italy 95 10
Japan 87 6
México 91 33
Poland 98 16
Russian_federation 73 32
Senegal 47 145
Turkey 76 87
A study was conducted to find whether there is any relationship between the mortality rate and percentage of the immunization in some countries of the world. The following set of data was found in the page "http://www.unicef.org/statistics/". Let us determine is there relationship for this set of data. The first column represents the countries and the second and third columns represent the % of immunization and mortality rate of each country.
Steps in SPSS for draw Scatter diagram
Graphs>Chart builder>OK>front the variable box, take the variable immunization to “x-axis” and Rate_mortality to “y-“x-axis” and click in Group Point ID> take the variable country to the Point ID>OK
1
3
4
5
OK
Step 3. Regression Analysis
Scatter diagram of the mortality rate by % immunization with regression line inserted in some countries in the world
Steps in SPSS for Regression
Analyze >Regression Linear>
1
2
3
4
5
6
Interpretation from outcome of SPSS
•Checking the Model Fit
Model Summary
Model R R Square
Adjusted R Square
Std. Error of the Estimate
1 .791a .626 .605 40.13931
a. Predictors: (Constant), Immunization %
The model summary table reports the strength of the relationship between the model and the dependent variable. “R=.791”, correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong positive or direct relationship.
R Square = .626, the coefficient of determination, is the squared value correlation coefficient. It shows that about 62.6% the variation in mortality is explained by the model.
ANOVAa
Model
Sum of
Squares df
Mean
Square F Sig. 1 Regression 48497.050 1 48497.05 30.101 .000b
Residual 29000.950 18 1611.16 Total 77498.000 19
a. Dependent Variable: Mortality_rate b. Predictors: (Constant), Immunization %
The significance value of the F statistic is less than 0.05, which means that the variation explained by the model is not due to chance.
Checking the coefficients of the regression line
(parameter estimates)
This table shows the coefficients of the regression line:
•The first variable (constant) represents the constant, also referred to as the point to intercept the regression line when it crosses the Y axis. In other words this is the predicted value of mortality when all other variables are 0.
•The second, these are the values for the regression equation for predicting the dependent variable from the independent variable.
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. B Std. Error Beta
1 (Constant) 224.316 31.440 7.135 .000 Immunization
% -2.136 .389 -.791 -5.486 .000 a. Dependent Variable: Mortality_rate
The regression equation can be presented in many different ways, for example:
Mortality predicted= 224.316 - 2.136* % of immunization
= 224.316 average mortality rate without any influence of the % of immunization (constant source).
= - 2.136 decreased mortality rate for each % of immunization as indicated nonzero correlation (slope of the line)1
0
Prediction of Mortality Rate
What rate of mortality could be predicted for the group of countries with
80% immunization?
The best estimate of the mortality is obtained by substituting the value of
80% for that of the independent variable, x, and calculating the
corresponding value of the Mortality.
Estimated Mortality:
mortality
of
rate
X
Y
224
.
316
2
.
136
224
.
316
2
.
136
*
80
53
.
436
53
Expected mortality would be 53 mortality rate.
With these results we conclude:
1. The variables are associated or related linearly in the population from which the sample comes (with a very small chance that the relationship found is explained by chance, less than one per thousand).
2. Found that the relationship is very good (r = - .791), in fact that the independent variable (% of immunization) explained 62.6% ( ) the variability of the dependent variable (mortality).
3. That the relationship is inverse or negative, decreasing in average mortality rate 2,136 per % increase in immunization in the countries under study.
The Multiple Linear Regression Model
Multiple linear regressions are an extension of the simple model that
incorporates two or more independent variables. Multiple regression
analysis produces an equation with several coefficients, depending on
the number of independent variables X are introduced to the model, thus
generating hyper planes.
i n
n
X
X
X
Y
0
1 1
2 2
Why is this important?
The relationship is rarely a function of just one variable, but is instead
influenced by many variables. So the idea is that we should be able to
obtain a more accurate predicted score if using multiple variables to
predict our outcome.
βi is the intercept and βi determines the contribution of the independent variable xi
X
1i,
X
2i, …,
X
kiare values on the independent variables for unit i
Example for Multiple Regression
The following table presents information on three variables for a small
sample of eight nations. We will take the abortion rate as the dependent
variable and examine the relationship with two variables: one measures
the status and power of women and the other measures religiosity.
Nation
Abortion
Rate (Y)
Women's
Status (x
1)
Religiosity
(x
2)
Canada
165
0.5
74
Chile
100
0.45
93
Denmark
400
0.8
48
Germany
208
0.54
67
Italy
389
0.7
70
Japan
379
0.52
55
UK
207
0.58
67
US
428
0.84
35
The research
question might be:
“How much does an
independent
variable contribute
to explaining
dependent variable
after the effect of
another
independent
Output from SPSS (simple correlation)
Correlations
Abortion Rate (Y)
Women's Status (X1)
Religiosity (X2) Abortion Rate
(Y)
Pearson
Correlation 1 .817* -.842**
Sig. (2-tailed) 0.013 0.009
N 8 8 8
Women's Status (x1)
Pearson
Correlation .817* 1 -.801*
Sig. (2-tailed) 0.013 0.017
N 8 8 8
Religiosity (x2) Pearson
Correlation -.842** -.801* 1
Sig. (2-tailed) 0.009 0.017
N 8 8 8
*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).
Scatter Diagram
Regression line is the best straight line description of the plotted points and use can use it to describe the association between the variables.
If all the lines fall exactly on the line then the line is 0 and you have a perfect relationship.
In the first chart we can see when the woman has a higher state will have the highest rate of abortion.
Steps for Multiple Regressions in SPSS
1. Click ANALYZE 2. Select
REGRESSION 3. Click LINEAR 4. Move “Abortion
Rate” to
DEPENDENT Box
5. Move “Women status and religiosity” to INDEPENDENT (S) box
6. Click OK 7. Continue the
Output Multiple Correlation and Coeficient of determiantion
Model Summaryb
Model R R Square
Adjusted R Square
Std. Error of the Estimate
Durbin-Watson 1 .875a 0.765 0.671 73.19844 1.569
Predictors: (Constant), Religiosity (x2), Women's Status (x1) Dependent Variable: Abortion Rate (Y)
Interpret multiple correlation coefficient (R), and the coefficient of multiple determination (R2). How much of the variance in abortion rate is explained by
the two independent variables?
R = .875 (the model improved by interacting independent variables), in
other hand there are strong correlation between religiosity and women’s
status with abortion rate.
(R
2= .765.) 76.5% of the variation in abortion rate can be explained by
variation in religiosity and women’s status.
Assumption of Autocorrelation
Checking Values are not correlated and multicollinearity
We use the Durbin-Watson statistic is a test to find out the serial correlation between adjacent error terms (residuals). The range of this statistic ranges from 0 to 4. A value around 2 means that errors are not correlated, less than 2 that the errors are positively correlated and greater than 2 that are negatively correlated. In the example Durbin-Watson = 1.569 is a value slightly less than 2, indicating that the errors terms are not autocorrelated.
Multicollinearity exists when independent variables in a regression equation are highly correlated among themselves.
Multicollinearity in regression analysis refers to how strongly interrelated the independent variables in a model are. When multicollinearity is too high, the individual parameter estimates become difficult to interpret. Most regression programs can compute variance inflation factors (VIF) for each variable. As a rule of thumb, VIF above 5.0 suggests problems with multicollinearity.
ANOVA Regression
Interpretation: As Sig < 0.05 then reject null hypothesis, indicating that at least one of the explanatory variables is related or affects to abortion rate. We conclude that the model is useful for predicting.
Hypothesis of Slope
Approach of the hypothesis:
(Consider that all the coefficients are simultaneously equal to zero)
(At least one regression coefficient is not equal to zero) ANOVAa Model Sum of Squares df Mean
Square F Sig.
1 Regression 87171.94 2 43585.971 8.135 .027b Residual 26790.06 5 5358.012
Total 113962 7
a. Dependent Variable: Abortion Rate (Y)
b. Predictors: (Constant), Religiosity (x2), Women's Status (x1)
0
:
0
:
1 2Regression Equation
Coefficientsa Model Unstandardized Coefficients Standardized
Coefficients t Sig. B Std. Error Beta
1
(Constant) 310.885 345.19 0.901 0.409 Women's
Status (x1) 348.413 317.472 0.398 1.097 0.322 Religiosity
(x2) -3.789 2.624 -0.523 -1.444 0.208
a. Dependent Variable: Abortion Rate (Y)
•Find the multiple regression equation with Women's Status (x1) and religiosity (x2).
The model has the following equation:
18
.
231
90
*
789
.
3
75
.
0
*
413
.
348
885
.
310
_
,
*
789
.
3
*
413
.
348
885
.
310
_
rate
Abortion
therefore
y
religiosit
status
rate
Abortion
Religiosity is negatively related to abortion rate and women's status is positively related to abortion rate
The
predicted abortion rate is 231.18 •What will be abortion rate would be expected for Women's Status 0.75, and religiosity of 90?
2
1
3
.
789
413
.
348
885
.
310
ˆ
x
x
Assignment 5
1. Find and interpret the relationship between Anxiety and Test Scores (follow all steps)
2. In a study of the relationship between level education and income the following data was obtained. Find the relationship between them and comment.
Sample Level Education (x) Income (y)
A Preparatory 25
B Primary 10
C Master’s degree 8
D Secondary 10
E Bachelor degree 15
F Illiterate 50
G Postgraduate diploma 60
Compute the Spearman rank correlation coefficient and test it for significance at the .05 level. What conclusion may be reached?
(
x
)
Anxiety 10 8 2 1 5 6
(
Y
) Test
score 2 3 9 7 6 5
3. A psychologist believes that those who score high on a need-achievement test will likely have a high salary to match. To test this theory, the psychologist has given questionnaires to a random sample of 17 subjects and has ranked the data so that the highest value in each category has been assigned a 1.
Subject A B C D E F G H I J K L M N O P Q
Rank - Need
Achievement 1 8 4 10 12 2 13 6 16 11 14 3 9 7 15 17 5 Salary Rank 3 7 2 12 9 1 11 6 17 13 15 5 10 8 14 16 4
Assignment 5
,
Assignment 5
5.Open from SPSS data file “survey_sample.sav.” This data file contains survey data, including demographic data and various attitude measures. It is based on a subset of variables from the 1998 NORC General Social Survey.
With this data calculate and interpret:
a.Compute and interpret the coefficient of simple correlation (hours per day watching TV and Highest year of school completed)
b.Draw a scatter diagram and interpret with the variables from the previous example
c.Compute and interpret the multiple coefficient of correlation. (From here, use the variables indicated below)
Data:
Dependent variable: Total family income
Independents variables: Age of respondent; Highest year of school completed; Highest year school completed, father; Highest year school completed, mother; Highest year school
completed, spouse.
d. Compute and interpret the multiple coefficient of determination within the context of this problem
e. Compute and interpret the multiple regression equation. Is the model significant (perform the hypotheses for multiple regression analysis)
f. From the analysis performed, would you recommend removing any variable (s) that do not contribute significantly to the model?
g. Check if the assumptions of autocorrelated assumed (Durbin Watson)
h. What will be the total family income that would be expected for the 50-year-old participant, who has 15 years of study, and that his spouse completed 14 years of study and the
Assignment 5
6. Given a hypothetical sample of 20 patients who have collected the following data: cholesterol level in blood plasma (in mg/100 ml), age (in years), saturated fat (in g/ week) and level exercise (quantified as 0: no exercise, 1: moderate exercise and 2: intense exercise), the adjustment to a linear model between cholesterol level and other variables.
Develop analysis in statistical software and interpret the output. Note. Answer the questions like the previous exercise (alternatives 'a to h')
h. What will be the cholesterol that would be expected for the 60-year-old patient, who consumes 40 grams of fat and does not do any type of exercise?
Patient Cholesterol Age Fat Exercise
1 350 80 35 0
2 190 30 40 2
3 263 42 15 1
4 320 50 20 0
5 280 45 35 0
6 198 35 50 1
7 232 18 70 1
8 320 32 40 0
9 303 49 45 0
10 220 35 35 0
11 405 50 50 0
12 190 20 15 2
13 230 40 20 1
14 227 30 35 0
15 440 30 80 1
16 318 23 40 2
17 212 35 40 1
18 340 18 80 0
19 195 22 15 0