ENTREVISTES SEMI-ESTRUCTURADES - APLICACIÓ PRÀCTICA

PART II: APLICACIÓ PRÀCTICA

4. METODOLOGIA

5.1. ENTREVISTES SEMI-ESTRUCTURADES

2.8.1 Analysis Of Variance

Analysis of Variance (ANOVA) is a statistical method of comparing means from separate populations. This is done by taking the variances of the data sets into account. In order to see this in practical terms, quantitative and qualitative variables should be defined.

Quantitative variable are variables that are measured on a naturally occurring number scale where mathematical operation on this variable will make sense (Mendenhall & Sincich 2003).

An example of a quantitative variable is the sales price of a given house. A qualitative vari-able is non-numerical and is classified into a category or a group of categories. An example of this would be the presence of a balcony in a house. In practice, a qualitative variable is quantified through the use of dummy variables. Dummy variables involve numerical coding.

A type of dummy variable known as a binary variable can assume two values, a 1 or a 0.

The expression below shows how a the presence of a balcony is coded. This coding can be used in the regression analysis.

Dummy Variable =

1 if a balcony is present

0 otherwise

The hypothesis for any ANOVA test is given by the following hypotheses when µ1 to µk

are population means of population 1 to population k.

H₀: µ1=µ2=...=µk

H1: µ1 6= µ2 6=.... 6= µk

This is very useful for quantifying the relationship between quantitative variables and qualitative variables. The idea is that the sale price of a house with a particular feature can be considered to be from a different population as those houses without that feature.

This would allow researchers to statistically compare the difference between the mean sales prices of houses with a particular feature and houses without that feature. Table 2.1 shows

ANOVA →

DF SumOfSq MeanSq FRatio PValue

Model k − 1 SST ^SST_k−1 F p

Error n− k SSE ^SST_n−k Total n−1 SS(T otal)

Table 2.1: ANOVA Table Source: (Mendenhall & Sincich 2003)

an example of an ANOVA Table (Mendenhall & Sincich 2003).

k = The number of different values the the categorical variable can assume n = the number of observation

SST = Sum of Squares for treatments SSE = Sum of Squared Errors

SS(T otal) = Sum of Squares F= ^{M ST}_{M SE}

p= pvalue associated with the F

The equations 2.9 illustrate how each element is calculated given k categories and n ob-servations. The quantitative variable considered is y. CM is known as the Correction for Mean (Mendenhall & Sincich 2003).

CM = (Pn i=1yi)²

n (2.9a)

SS(T otal) = ( Xn

i=1

y_i²)-CM (2.9b)

SST = Xk

i=1

T_i²

n²_i -CM (2.9c)

SSE = SS(T otal) − SST (2.9d)

The most important element of this ANOVA table is the p-value associated with the F ratio. P-values less than 0.05 allows a researcher to conclude that the means of the two

populations are significantly different at a 5 percent significance level. This can be used make conclusions about the correlation of the qualitative attribute of a house and the selling price of that house. Each population must be normally distributed, the variances of each population must be equal and populations must be independent of each other (Mendenhall

& Sincich 2003).

2.8.2 Kendall’s Tau

ANOVA can measure the correlation between quantitative variables and qualitative variables but an entirely different approach needs to be used when measuring a relationship between two qualitative variables. The first tool to be used with qualitative variables is the contin-gency table. A contincontin-gency table lists the frequency of the levels of two qualitative variables.

Table 2.2 shows how the frequency of 4 categories can be counted. Table 2.2 is a 2 by 2 contingency table.

Level 1a Level 2a Totals

Level 1b a b a+ b

Level 2b c d c+ d

Totals a+ c b+ d a+ b + c + d Table 2.2: Contingency Table

Source: (Litvine 2004)

In the first column where the column and row for Level 1a and Level 2b intersect, the value of a shows the amount of observations that can be classified in category a level 1 and category b level 1. The totals columns and rows will show the total number of observations that occur in that particular category and level. This can be used to show the frequency of qualitative attributes in a set of houses. If a set of houses are investigated as to whether they have swimming pools and balconies, the contingency table will show how many houses contain swimming pools and balconies, how many houses contain a swimming pool only, how many houses contain balconies only and how many houses contain neither feature.

Kendall’s Tau-b can show a measure of association between two categorical variables in a 2 by 2 contingency table. The idea is to describe how likely a categories level will occur given another categories level. If there is a high likelihood of category levels occurring to-gether then the qualitative variables are said to be associated (Litvine 2004). Equation 2.10 shows how Kendall’s tau is calculated based on table 2.2.

τb = ad− bc

p(a + b) ∗ (c + d) ∗ (a + c) ∗ (b + d) (2.10)

Kendall’s τbcan only assume value of between 1 and −1. A value of 1 implies perfect associa-tion between variables and a value of −1 implies perfect negative associaassocia-tion (Litvine 2004).

A value of zero implies that the two qualitative variables are statistically independent of each other. In a regression model, this is a useful tool for detecting correlation between qualitative independent variables.

2.8.3 Pearson’s Coefficient of Correlation

Pearson’s correlation coefficient is a measure of linear correlation between two quantitative variables (Keller 2003). Equation 2.11 shows how the correlation between two variable x and y is calculated.

ρ= COV(x, y) sx∗ sy

(2.11) The interpretation is the same as that of Kendall’s Tau. Pearson’s coefficient of correla-tion can assume values of between 1 and −1. A value of 1 implies perfect positive correlacorrela-tion and a value of −1 implies perfect negative correlation. This measure would be particularly useful in quantifying the extent of the correlation between sales prices and quantitative features of a house such as the size of the built structure.

2.8.4 Non-Linear Model Fitting

Non linear model fitting through numerical estimation provides an advantage over linear model fitting. A binary variable can be created for each level that the qualitative factor can assume. This presents a problem with linear estimation procedures since the use of all binary variables for each level will create a matrix of full rank. This means that the design matrix for linear estimation cannot be inverted and no unique parameter solution will exist (Mendenhall & Sincich 2003). A reference variable is used to solve this problem. This means that all subsequent parameters that are estimated are compared with the reference variable. The problem here is that the base variable is subjectively chosen. Rogers (2000) states that by using a reference variables to make estimation possible, researchers will end up omitting important variables and therefore introducing omitted variable bias. The nonlinear numerical estimation procedure allows parameters to be estimated without needing to invert the design matrix.

2.9 Conclusion

The litereture has shown many aspects of valuing residential real estate. The concepts of the property market and fair market value have been defined as well as the mechanisms of the

market place. These concepts form the basis of any valuation in for real estate. A review of the South African property market has shown steady increases in price levels of houses since 1993 with a peak in 2007 and a decline at the end of 2008. The decline was related to deteriorating macro-economic conditions at the time. There are several traditionally used methods for making valuations. The comparable sales method is the most commonly used method in residential real real estate with investment approaches more appropriate for commercial properties. A regression approach to valuation was found to be the basis of many valuation methods. The formal justification for multiple regression analysis is known as hedonic modeling. Hedonic modeling has many strengths and weakness but it is a popular method in real estate research. Since the variables to be included in a hedonic model are not known, a formal selection process is employed. A frequentist approach uses forward and backward selection routines by optimizing a formal criteria. In this study, the Adjusted R² is used as well as the Akaike and Bayesian Information criteria. The Bayesian criteria is shown to be more conservative than the Akaike criteria in model selection. Variables can be defined as a qualitative or quantitative variable. Analysis of variance is used to assess the correlation between qualitative and quantitative variables. Kendall’s Tau measures the association between two categorical variables. Pearson Correlation coefficient measures the linear relationship between quantitative variables. The omitted variable bias is a problem when selecting a reference variable for real estate data. If a reference variable is not selected then least squares estimation is not possible due to a singular design matrix. A numerical estimation procedure can be used that doesn’t rely on the design matrix being invertible.

Chapter 3

In document L'estat emocional dels pacients amb malaltia de Crohn (página 41-50)