4. ANÁLISIS FINANCIERO
4.2 Cronograma de desarrollo
Based on the real estate data and the geo-derived parameters, we did an exploratory regression analysis (using the ArcGIS tool for this) to obtain insights in the
relationships. We structured the data as described with the population being real estate sales. Using sales as the ultimate “ground truth” may be discussed, but this is an easy way to get objective data for attractiveness. In the regression analysis, we used the information provided in conjunction with the sales in addition to the geo- parameters.
Linear Transformation of explanatory variables
The exploratory regression analysis is based on Ordinary Least Square-regression analysis, testing for linear correlation. As the scope and scales of potential
explanatory variables vary largely, Linear transformation of variables was tested on all explanatory variables found to be significant contributors, probing whether any linear transform might better correlation. The Dependent variables are unchanged. Figure 4.1. Regression analysis to gain insight in to which parameters are important
Choice of dependent variable
Seeking to explain housing prices raises also the question whether it is “Total sales price” or “Price per m2” that expresses attractivity best. It is possible to make a case for both approaches, that they both say something about attractivity, but different aspects. A potential buyer might tolerate a higher “Price per m2” if being close to education facilities, restaurants and theatres is more important than amount floor space. At a different place in life the same buyer might prefer/need more space, trying with his or her means to optimize on space, in an as attractive location as possible.
Instead of choosing one of these two approaches we have in the project explored both, finding that most variables are only significant in one of the approaches. A third approach where similar sized dwellings are compared is also explored. We have called it the Compared same size-approach.
Figure 4.2. “Total sales prices”, Oslo – Mean within 500m X 500m grid cell
Figure 4.3. Price per m2, Oslo - Mean within 500m X 500m grid cell
The two illustrations give a visual impression of variation in actual Total sales prices and Price per m2 in Oslo (2014), based on mean values within a 500m X 500m grid. “Natural Break” is used to divide the values to 5 groups, with quite extreme price differences between most and least expensive areas for both approaches.
There is apparent geographic clustering of house prices throughout the city,
especially for Price per m2, gravitating out from the clustering of red cells where Oslo town center lies. Total sales prices appear in some parts more dispersed, and more clustered in others. The map suggests an east- west divide, commonly also perceived as such in Oslo. The project methodology seeks to find variables to explain this price variation, using the findings to predict estimates of Total sales prices and Price per m2.
Figure 4.4. Compared same sizes, Oslo - Mean within 500m X 500m grid cell
Square metres floor space is in our project found to be a significant explanatory variable in explaining variation in both “Total sales price” and “Price per m2”. As
isolated variable, it can be used to explain 60 per cent of price variation of “Total sales price” (AdjR2 of 0.60) in Oslo, and 32 per cent of variation in “Price per m2”
(AdjR2 of 0.32).
In the Compared same size-approach, m2 floor space is “baked” into the
Dependent variable, where each sold dwelling “Price per m2” is compared to
“mean Price per m2” for all sold dwellings with similar floor space (+-4 m2). Even
though our data includes all dwelling sales in Norway for a whole year, we found that including +- 4m2 in each comparison was necessary to smooth the data,
compensating for natural chance and randomness.
The Compared same size approach can be said to be cleaner in giving more space to non-building-intrinsic explanatory variables. The below shows how this is true for the variable Education level of population within 250 m, in itself explaining 47 per cent of the variation in Compared same sizes, in comparison to 35 per cent and 20 per cent for the other approaches.
AdjR2 Total sales
price Price per m
2 Compared same size
Isolated - Education level
of population within 250 m 0.20 0.35 0.47
However, when combining all explanatory variables, the highest achievable Adjr2
for the Compared same sizes approach is as low as 0.61 for Oslo, a figure that decreases when the same explanatory variables are tested on Norway’s other larger urban settlements. The approach gives some interesting insights on the strength of individual variables, and the fact that size of city effects overall results. Still, the overall AdjR2 might be said to be too small to be sufficient for meaningful
prediction.
In the Compared same size-approach, m2 floor
space is “baked” in to the Dependent variable (that we wish to seek to explain).
The generated values are: Price per m2DIVIDED
by Mean Price per m2 for
all sold similar size dwellings (+- 4 m2) in the
city. Example:
Kroner per m2 for properties of 20m2 DIVIDED by
Mean Kroner per m2 for properties 16m2-24m2 The average will always be 1.
The prediction part our project is therefore based solely on the results for the “Total sales price” and “Price per m2” approaches, with highest achieved AdjR2 at
respectively 0.82 and 0.74 for Oslo. For these approaches, we also only make predictions for cities where highest achieved AdjR2 => 0.70, which exclude the
smaller cities in the project.
The table below specifies the highest achievable AdjR2 for the three different
approaches, with the number of explanatory variables utilized in each model. The explanatory variables utilized in the three approaches are our models for best explanation.
Oslo
AdjR2 Number of explanatory variables in modelTotal sales price 0.82 8
Price per m2 0.74 9
Compared same sizes 0.61 7
The ArcGIS Exploratory analysis tool is used. In relation to a chosen dependent variable, the tool firstly tests each explanatory variable isolated, secondly all pairs of variables, thirdly all threesomes of variables, and so on with as many
explanatory variables brought into the analysis (upper limit:10). The analysis output specifies the highest achievers (R2/AdjR2) at each combination, whether
variables in these combinations contribute significantly, in which direction they contribute (+ or -), VIF for the combination, and other measures.
Below is an example of output results for the 3 best combinations of 7 variables in explaining “Price per m2” in Oslo, where Passing Models would specify a
combination which passes all criteria set in the tool.
The combination fails on the Jarque-Bera test and Spatial Autocorrelation test, and is as such not a valid model.
Choose 7 of 9 Summary
Highest Adjusted R-Squared Results
AdjR2 AICc JB K(BP) VIF SA Model
0,74 475833,93 0,00 0,00 2,82 0,00 +POP_EDUC_L*** -CENTREZ_DIST*** -WATER_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***
0,74 475956,20 0,00 0,00 2,91 0,00 -RESTAURANT_DIST*** +POP_EDUC_L*** -CENTREZ_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***
0,74 476110,96 0,00 0,00 2,90 0,00 -HOSPITAL_DIST*** +POP_EDUC_L*** -CENTREZ_DIST*** +FLOOR_SPACE_RECI*** +POP_AGE*** -UNIVERS_DIST*** -BUILDING_AGE***
Passing Models AdjR2 AICc JB K(BP) VIF SA Model
After running all combinations, each variables effect is summed up for all combinations:
1. how often a variable is significant
2. how often they contribute in each direction (+ or -) 3. Multicollinearity: VIF and violations on the test 4. Spatial Autocorrelation
5. Passing tests
Below is output summary for 1, 2, 3 and 4 above, for “Price per m2” in Oslo:
Summary of Variable Significance
Variable % Significant % Negative % Positive HOSPITAL_DIST 100,00 100,00 0,00 RESTAURANT_DIST 100,00 100,00 0,00 POP_EDUC_L 100,00 0,00 100,00 CENTREZ_DIST 100,00 100,00 0,00 FLOOR_SPACE_RECI 100,00 0,00 100,00 BUILDING_AGE 100,00 100,00 0,00 WATER_DIST 96,88 100,00 0,00 POP_AGE 96,09 4,30 95,70 UNIVERS_DIST 94,92 83,59 16,41 --- Summary of Multicollinearity
Variable VIF Violations Covariates HOSPITAL_DIST 1,91 0 --- RESTAURANT_DIST 1,52 0 --- POP_EDUC_L 1,94 0 --- CENTREZ_DIST 2,97 0 --- WATER_DIST 1,16 0 --- FLOOR_SPACE_RECI 1,24 0 --- POP_AGE 1,14 0 --- UNIVERS_DIST 2,13 0 --- BUILDING_AGE 1,22 0 --- --- Summary of Residual Spatial Autocorrelation (SA)
SA AdjR2 AICc JB K(BP) VIF Model
0,000000 0,817885 694356,672574 0,000000 0,000000 3,424741 -RESTAURANT_DIST*** -
CENTREZ_DIST*** -WATER_DIST*** +FLOOR_SPACE_SQR*** +POP_EDUC_L_P5*** +POP_INCOME*** +POP_AGE*** -BUILDING_AGE***
For Spatial correlation (Summary point 4), the analysis output specifies further that none of our combinations of variables pass the Spatial autocorrelation test, and that we as such do not have a model that passes all regression analysis tests.
This is also one of the important conclusion of our project. The variables we have at hand are not sufficient to create a model that meets all requirements. We might suspect that access to more intrinsic characteristics on the dwellings might have given variables that would remedy this.
Criteria for final set of explanatory variables in each model, Oslo
We have set the following criteria for final variables in each of the 3 approaches, based on Oslo. Chosen variables meet ALL following criteria 1, 2A, 3B, 3: 1. Have a positive effect of >= 0.01 on total combined AdjR2 for the approach 2. Contribute significantly in explaining variation of the dependent variable, A
and B
A. Significant more than 95 per cent of times
B. Contributes > 85 per cent of times in same direction (+ OR -) 3. No violations of Multicollinearity: VIF < 5
Based on above criteria, following variables are utilised (X) for the three approaches “Total sales price”, “Price per m2” and “Compared same size”. The column Variable short name states the shortened variable names actually used in the datasets.
Data from real estate agencies by dwelling
Variable
type Variable Variable short name Total Sales
Price
Price per
m2 Compared same size
Dwelling DwellingId
Floor space Floor_space X X
Age of building Building_age X X X
Number of bedrooms Distance to geographic entities/areas CentreZoneId CentreZ_dist X X X Recreational areas
Lakes&Rivers & Coastline Water_dist X X X
Distance to public transport Distance to public rail transport Distance from road with speed limit 60 km/h
Distance to
buildings Primary Health institutions
School (Primary/Secondary)
Hospital Hospital_dist X X
Kindergarten
The following chapters 4.4 and 4.5 looks at the exploratory analysis output for each “best” model, respectively for “Total sales price”, “Price per m2”, and “Compared
same sizes”.
Analysis output for Compared same sizes lies in appendix D.
Each chapter starts out with Oslo (which is the city the models are calibrated by), followed by summarized results on how the model fares in all other Norwegian urban settlements > 50 000 inhabitants.
When considering how our Oslo variables perform in other cities, criteria for passing the set criteria are somewhat “relaxed”. Variables should meet ALL following criteria 1, 2A, 3B, 3:
1. Have a positive effect on total combined AdjR2 for the approach
2. Contribute significantly in explaining variation of the dependent variable, A and B
A. Significant more than 80 % of times
B. Contributes > 80 % of times in same direction (+ OR -) 3. No violations of Multicollinearity: VIF < 5
Restaurant Restaurant_dist X X X
buildings built pre 1900 Intensity-
environment Noise 2011 (day equivalent level in dba) Number of Sun hours
Population Household income – before taxes
Household income
–after taxes Pop_Income X X
Level of education Pop_Educ_L X X X
Immigration
Population with non-western
ancestry Pop_nonwest X
Age – mean if population Pop_age X X
Percentage below 18 years old Pop_child X
Employment Employees within 5 km Employees within 10 km