PART II: APLICACIÓ PRÀCTICA
4. METODOLOGIA
5.1. ENTREVISTES SEMI-ESTRUCTURADES
2.8.1 Analysis Of Variance
Analysis of Variance (ANOVA) is a statistical method of comparing means from separate populations. This is done by taking the variances of the data sets into account. In order to see this in practical terms, quantitative and qualitative variables should be defined.
Quantitative variable are variables that are measured on a naturally occurring number scale where mathematical operation on this variable will make sense (Mendenhall & Sincich 2003).
An example of a quantitative variable is the sales price of a given house. A qualitative vari-able is non-numerical and is classified into a category or a group of categories. An example of this would be the presence of a balcony in a house. In practice, a qualitative variable is quantified through the use of dummy variables. Dummy variables involve numerical coding.
A type of dummy variable known as a binary variable can assume two values, a 1 or a 0.
The expression below shows how a the presence of a balcony is coded. This coding can be used in the regression analysis.
Dummy Variable =
1 if a balcony is present
0 otherwise
The hypothesis for any ANOVA test is given by the following hypotheses when µ1 to µk
are population means of population 1 to population k.
H0: µ1=µ2=...=µk
H1: µ1 6= µ2 6=.... 6= µk
This is very useful for quantifying the relationship between quantitative variables and qualitative variables. The idea is that the sale price of a house with a particular feature can be considered to be from a different population as those houses without that feature.
This would allow researchers to statistically compare the difference between the mean sales prices of houses with a particular feature and houses without that feature. Table 2.1 shows
ANOVA →
DF SumOfSq MeanSq FRatio PValue
Model k − 1 SST SSTk−1 F p
Error n− k SSE SSTn−k Total n−1 SS(T otal)
Table 2.1: ANOVA Table Source: (Mendenhall & Sincich 2003)
an example of an ANOVA Table (Mendenhall & Sincich 2003).
k = The number of different values the the categorical variable can assume n = the number of observation
SST = Sum of Squares for treatments SSE = Sum of Squared Errors
SS(T otal) = Sum of Squares F= M STM SE
p= pvalue associated with the F
The equations 2.9 illustrate how each element is calculated given k categories and n ob-servations. The quantitative variable considered is y. CM is known as the Correction for Mean (Mendenhall & Sincich 2003).
CM = (Pn i=1yi)2
n (2.9a)
SS(T otal) = ( Xn
i=1
yi2)-CM (2.9b)
SST = Xk
i=1
Ti2
n2i -CM (2.9c)
SSE = SS(T otal) − SST (2.9d)
The most important element of this ANOVA table is the p-value associated with the F ratio. P-values less than 0.05 allows a researcher to conclude that the means of the two
populations are significantly different at a 5 percent significance level. This can be used make conclusions about the correlation of the qualitative attribute of a house and the selling price of that house. Each population must be normally distributed, the variances of each population must be equal and populations must be independent of each other (Mendenhall
& Sincich 2003).
2.8.2 Kendall’s Tau
ANOVA can measure the correlation between quantitative variables and qualitative variables but an entirely different approach needs to be used when measuring a relationship between two qualitative variables. The first tool to be used with qualitative variables is the contin-gency table. A contincontin-gency table lists the frequency of the levels of two qualitative variables.
Table 2.2 shows how the frequency of 4 categories can be counted. Table 2.2 is a 2 by 2 contingency table.
Level 1a Level 2a Totals
Level 1b a b a+ b
Level 2b c d c+ d
Totals a+ c b+ d a+ b + c + d Table 2.2: Contingency Table
Source: (Litvine 2004)
In the first column where the column and row for Level 1a and Level 2b intersect, the value of a shows the amount of observations that can be classified in category a level 1 and category b level 1. The totals columns and rows will show the total number of observations that occur in that particular category and level. This can be used to show the frequency of qualitative attributes in a set of houses. If a set of houses are investigated as to whether they have swimming pools and balconies, the contingency table will show how many houses contain swimming pools and balconies, how many houses contain a swimming pool only, how many houses contain balconies only and how many houses contain neither feature.
Kendall’s Tau-b can show a measure of association between two categorical variables in a 2 by 2 contingency table. The idea is to describe how likely a categories level will occur given another categories level. If there is a high likelihood of category levels occurring to-gether then the qualitative variables are said to be associated (Litvine 2004). Equation 2.10 shows how Kendall’s tau is calculated based on table 2.2.
τb = ad− bc
p(a + b) ∗ (c + d) ∗ (a + c) ∗ (b + d) (2.10)
Kendall’s τbcan only assume value of between 1 and −1. A value of 1 implies perfect associa-tion between variables and a value of −1 implies perfect negative associaassocia-tion (Litvine 2004).
A value of zero implies that the two qualitative variables are statistically independent of each other. In a regression model, this is a useful tool for detecting correlation between qualitative independent variables.
2.8.3 Pearson’s Coefficient of Correlation
Pearson’s correlation coefficient is a measure of linear correlation between two quantitative variables (Keller 2003). Equation 2.11 shows how the correlation between two variable x and y is calculated.
ρ= COV(x, y) sx∗ sy
(2.11) The interpretation is the same as that of Kendall’s Tau. Pearson’s coefficient of correla-tion can assume values of between 1 and −1. A value of 1 implies perfect positive correlacorrela-tion and a value of −1 implies perfect negative correlation. This measure would be particularly useful in quantifying the extent of the correlation between sales prices and quantitative features of a house such as the size of the built structure.
2.8.4 Non-Linear Model Fitting
Non linear model fitting through numerical estimation provides an advantage over linear model fitting. A binary variable can be created for each level that the qualitative factor can assume. This presents a problem with linear estimation procedures since the use of all binary variables for each level will create a matrix of full rank. This means that the design matrix for linear estimation cannot be inverted and no unique parameter solution will exist (Mendenhall & Sincich 2003). A reference variable is used to solve this problem. This means that all subsequent parameters that are estimated are compared with the reference variable. The problem here is that the base variable is subjectively chosen. Rogers (2000) states that by using a reference variables to make estimation possible, researchers will end up omitting important variables and therefore introducing omitted variable bias. The nonlinear numerical estimation procedure allows parameters to be estimated without needing to invert the design matrix.
2.9 Conclusion
The litereture has shown many aspects of valuing residential real estate. The concepts of the property market and fair market value have been defined as well as the mechanisms of the
market place. These concepts form the basis of any valuation in for real estate. A review of the South African property market has shown steady increases in price levels of houses since 1993 with a peak in 2007 and a decline at the end of 2008. The decline was related to deteriorating macro-economic conditions at the time. There are several traditionally used methods for making valuations. The comparable sales method is the most commonly used method in residential real real estate with investment approaches more appropriate for commercial properties. A regression approach to valuation was found to be the basis of many valuation methods. The formal justification for multiple regression analysis is known as hedonic modeling. Hedonic modeling has many strengths and weakness but it is a popular method in real estate research. Since the variables to be included in a hedonic model are not known, a formal selection process is employed. A frequentist approach uses forward and backward selection routines by optimizing a formal criteria. In this study, the Adjusted R2 is used as well as the Akaike and Bayesian Information criteria. The Bayesian criteria is shown to be more conservative than the Akaike criteria in model selection. Variables can be defined as a qualitative or quantitative variable. Analysis of variance is used to assess the correlation between qualitative and quantitative variables. Kendall’s Tau measures the association between two categorical variables. Pearson Correlation coefficient measures the linear relationship between quantitative variables. The omitted variable bias is a problem when selecting a reference variable for real estate data. If a reference variable is not selected then least squares estimation is not possible due to a singular design matrix. A numerical estimation procedure can be used that doesn’t rely on the design matrix being invertible.