• No se han encontrado resultados

Cronograma de desarrollo

4. ANÁLISIS FINANCIERO

4.2 Cronograma de desarrollo

Based on the real estate data and the geo-derived parameters, we did an exploratory regression analysis (using the ArcGIS tool for this) to obtain insights in the

relationships. We structured the data as described with the population being real estate sales. Using sales as the ultimate “ground truth” may be discussed, but this is an easy way to get objective data for attractiveness. In the regression analysis, we used the information provided in conjunction with the sales in addition to the geo- parameters.

Linear Transformation of explanatory variables

The exploratory regression analysis is based on Ordinary Least Square-regression analysis, testing for linear correlation. As the scope and scales of potential

explanatory variables vary largely, Linear transformation of variables was tested on all explanatory variables found to be significant contributors, probing whether any linear transform might better correlation. The Dependent variables are unchanged. Figure 4.1. Regression analysis to gain insight in to which parameters are important

Choice of dependent variable

Seeking to explain housing prices raises also the question whether it is “Total sales price” or “Price per m2” that expresses attractivity best. It is possible to make a case for both approaches, that they both say something about attractivity, but different aspects. A potential buyer might tolerate a higher “Price per m2” if being close to education facilities, restaurants and theatres is more important than amount floor space. At a different place in life the same buyer might prefer/need more space, trying with his or her means to optimize on space, in an as attractive location as possible.

Instead of choosing one of these two approaches we have in the project explored both, finding that most variables are only significant in one of the approaches. A third approach where similar sized dwellings are compared is also explored. We have called it the Compared same size-approach.

Figure 4.2. “Total sales prices”, Oslo – Mean within 500m X 500m grid cell

Figure 4.3. Price per m2, Oslo - Mean within 500m X 500m grid cell

The two illustrations give a visual impression of variation in actual Total sales prices and Price per m2 in Oslo (2014), based on mean values within a 500m X 500m grid. “Natural Break” is used to divide the values to 5 groups, with quite extreme price differences between most and least expensive areas for both approaches.

There is apparent geographic clustering of house prices throughout the city,

especially for Price per m2, gravitating out from the clustering of red cells where Oslo town center lies. Total sales prices appear in some parts more dispersed, and more clustered in others. The map suggests an east- west divide, commonly also perceived as such in Oslo. The project methodology seeks to find variables to explain this price variation, using the findings to predict estimates of Total sales prices and Price per m2.

Figure 4.4. Compared same sizes, Oslo - Mean within 500m X 500m grid cell

Square metres floor space is in our project found to be a significant explanatory variable in explaining variation in both “Total sales price” and “Price per m2”. As

isolated variable, it can be used to explain 60 per cent of price variation of “Total sales price” (AdjR2 of 0.60) in Oslo, and 32 per cent of variation in “Price per m2

(AdjR2 of 0.32).

In the Compared same size-approach, m2 floor space is “baked” into the

Dependent variable, where each sold dwelling “Price per m2” is compared to

“mean Price per m2” for all sold dwellings with similar floor space (+-4 m2). Even

though our data includes all dwelling sales in Norway for a whole year, we found that including +- 4m2 in each comparison was necessary to smooth the data,

compensating for natural chance and randomness.

The Compared same size approach can be said to be cleaner in giving more space to non-building-intrinsic explanatory variables. The below shows how this is true for the variable Education level of population within 250 m, in itself explaining 47 per cent of the variation in Compared same sizes, in comparison to 35 per cent and 20 per cent for the other approaches.

AdjR2 Total sales

price Price per m

2 Compared same size

Isolated - Education level

of population within 250 m 0.20 0.35 0.47

However, when combining all explanatory variables, the highest achievable Adjr2

for the Compared same sizes approach is as low as 0.61 for Oslo, a figure that decreases when the same explanatory variables are tested on Norway’s other larger urban settlements. The approach gives some interesting insights on the strength of individual variables, and the fact that size of city effects overall results. Still, the overall AdjR2 might be said to be too small to be sufficient for meaningful

prediction.

In the Compared same size-approach, m2 floor

space is “baked” in to the Dependent variable (that we wish to seek to explain).

The generated values are: Price per m2DIVIDED

by Mean Price per m2 for

all sold similar size dwellings (+- 4 m2) in the

city. Example:

Kroner per m2 for properties of 20m2 DIVIDED by

Mean Kroner per m2 for properties 16m2-24m2 The average will always be 1.

The prediction part our project is therefore based solely on the results for the “Total sales price” and “Price per m2” approaches, with highest achieved AdjR2 at

respectively 0.82 and 0.74 for Oslo. For these approaches, we also only make predictions for cities where highest achieved AdjR2 => 0.70, which exclude the

smaller cities in the project.

The table below specifies the highest achievable AdjR2 for the three different

approaches, with the number of explanatory variables utilized in each model. The explanatory variables utilized in the three approaches are our models for best explanation.

Oslo

AdjR2 Number of explanatory variables in model

Total sales price 0.82 8

Price per m2 0.74 9

Compared same sizes 0.61 7

The ArcGIS Exploratory analysis tool is used. In relation to a chosen dependent variable, the tool firstly tests each explanatory variable isolated, secondly all pairs of variables, thirdly all threesomes of variables, and so on with as many

explanatory variables brought into the analysis (upper limit:10). The analysis output specifies the highest achievers (R2/AdjR2) at each combination, whether

variables in these combinations contribute significantly, in which direction they contribute (+ or -), VIF for the combination, and other measures.

Below is an example of output results for the 3 best combinations of 7 variables in explaining “Price per m2” in Oslo, where Passing Models would specify a

combination which passes all criteria set in the tool.

The combination fails on the Jarque-Bera test and Spatial Autocorrelation test, and is as such not a valid model.

Choose 7 of 9 Summary

Highest Adjusted R-Squared Results

AdjR2 AICc JB K(BP) VIF SA Model

0,74 475833,93 0,00 0,00 2,82 0,00 +POP_EDUC_L*** -CENTREZ_DIST*** -WATER_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***

0,74 475956,20 0,00 0,00 2,91 0,00 -RESTAURANT_DIST*** +POP_EDUC_L*** -CENTREZ_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***

0,74 476110,96 0,00 0,00 2,90 0,00 -HOSPITAL_DIST*** +POP_EDUC_L*** -CENTREZ_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***

Passing Models AdjR2 AICc JB K(BP) VIF SA Model

After running all combinations, each variables effect is summed up for all combinations:

1. how often a variable is significant

2. how often they contribute in each direction (+ or -) 3. Multicollinearity: VIF and violations on the test 4. Spatial Autocorrelation

5. Passing tests

Below is output summary for 1, 2, 3 and 4 above, for “Price per m2” in Oslo:

Summary of Variable Significance

Variable % Significant % Negative % Positive HOSPITAL_DIST 100,00 100,00 0,00 RESTAURANT_DIST 100,00 100,00 0,00 POP_EDUC_L 100,00 0,00 100,00 CENTREZ_DIST 100,00 100,00 0,00 FLOOR_SPACE_RECI 100,00 0,00 100,00 BUILDING_AGE 100,00 100,00 0,00 WATER_DIST 96,88 100,00 0,00 POP_AGE 96,09 4,30 95,70 UNIVERS_DIST 94,92 83,59 16,41 --- Summary of Multicollinearity

Variable VIF Violations Covariates HOSPITAL_DIST 1,91 0 --- RESTAURANT_DIST 1,52 0 --- POP_EDUC_L 1,94 0 --- CENTREZ_DIST 2,97 0 --- WATER_DIST 1,16 0 --- FLOOR_SPACE_RECI 1,24 0 --- POP_AGE 1,14 0 --- UNIVERS_DIST 2,13 0 --- BUILDING_AGE 1,22 0 --- --- Summary of Residual Spatial Autocorrelation (SA)

SA AdjR2 AICc JB K(BP) VIF Model

0,000000 0,817885 694356,672574 0,000000 0,000000 3,424741 -RESTAURANT_DIST*** -

CENTREZ_DIST*** -WATER_DIST*** +FLOOR_SPACE_SQR*** +POP_EDUC_L_P5*** +POP_INCOME*** +POP_AGE*** -BUILDING_AGE***

For Spatial correlation (Summary point 4), the analysis output specifies further that none of our combinations of variables pass the Spatial autocorrelation test, and that we as such do not have a model that passes all regression analysis tests.

This is also one of the important conclusion of our project. The variables we have at hand are not sufficient to create a model that meets all requirements. We might suspect that access to more intrinsic characteristics on the dwellings might have given variables that would remedy this.

Criteria for final set of explanatory variables in each model, Oslo

We have set the following criteria for final variables in each of the 3 approaches, based on Oslo. Chosen variables meet ALL following criteria 1, 2A, 3B, 3: 1. Have a positive effect of >= 0.01 on total combined AdjR2 for the approach 2. Contribute significantly in explaining variation of the dependent variable, A

and B

A. Significant more than 95 per cent of times

B. Contributes > 85 per cent of times in same direction (+ OR -) 3. No violations of Multicollinearity: VIF < 5

Based on above criteria, following variables are utilised (X) for the three approaches “Total sales price”, “Price per m2” and “Compared same size”. The column Variable short name states the shortened variable names actually used in the datasets.

Data from real estate agencies by dwelling

Variable

type Variable Variable short name Total Sales

Price

Price per

m2 Compared same size

Dwelling DwellingId

Floor space Floor_space X X

Age of building Building_age X X X

Number of bedrooms Distance to geographic entities/areas CentreZoneId CentreZ_dist X X X Recreational areas

Lakes&Rivers & Coastline Water_dist X X X

Distance to public transport Distance to public rail transport Distance from road with speed limit 60 km/h

Distance to

buildings Primary Health institutions

School (Primary/Secondary)

Hospital Hospital_dist X X

Kindergarten

The following chapters 4.4 and 4.5 looks at the exploratory analysis output for each “best” model, respectively for “Total sales price”, “Price per m2”, and “Compared

same sizes”.

Analysis output for Compared same sizes lies in appendix D.

Each chapter starts out with Oslo (which is the city the models are calibrated by), followed by summarized results on how the model fares in all other Norwegian urban settlements > 50 000 inhabitants.

When considering how our Oslo variables perform in other cities, criteria for passing the set criteria are somewhat “relaxed”. Variables should meet ALL following criteria 1, 2A, 3B, 3:

1. Have a positive effect on total combined AdjR2 for the approach

2. Contribute significantly in explaining variation of the dependent variable, A and B

A. Significant more than 80 % of times

B. Contributes > 80 % of times in same direction (+ OR -) 3. No violations of Multicollinearity: VIF < 5

Restaurant Restaurant_dist X X X

buildings built pre 1900 Intensity-

environment Noise 2011 (day equivalent level in dba) Number of Sun hours

Population Household income – before taxes

Household income

–after taxes Pop_Income X X

Level of education Pop_Educ_L X X X

Immigration

Population with non-western

ancestry Pop_nonwest X

Age – mean if population Pop_age X X

Percentage below 18 years old Pop_child X

Employment Employees within 5 km Employees within 10 km

Documento similar