DIARIO DE CAMPO INVESTIGATIVO DE LA DECONSTRUCCIÓN N° 3 INSTITUCION EDUCATIVA N 3
SELECCIÓN DE COMPETENCIAS, CAPACIDADES E INDICADORES A
This review of the literature has served a number of purposes. First and foremost, it has provided a background framework through which the work that follows can be understood and viewed. It provides a justification for the research, defining the
MAUP as a long-standing complex problem that is as relevant to analysts today as it was in 1934 when first identified by Gehlke and Biehl (1934). Indeed, with the advent of increasingly powerful and cheap computer processes power, and with the proliferation of GIS technology the potential for analysis of data published in areal units has increased. Therefore, the need to research, understand and utilise the MAUP has, perhaps, never been greater than at this point in time. The work that follows seeks not only to build upon the literature that has been presented, discussed and critically examined, but also to extend the debate and increase the understanding of the MAUP. However, as has been a theme running through the whole of the preceding chapter, the work that follows treats the MAUP not as a problem, but as a phenomenon. It seeks not to determine a solution to the MAUP, but to increase the understanding of the concept. Finally, it attempts to seek ways in which the MAUP can be harnessed to increase the analysts understanding, not of a problem with the data, but of the complexity of the social world within which all data are positioned.
Using the literature as a basis for investigation it appears pertinent that a full investigation of British Census data is made. Previous literature has either provided a large amount of variables in a small area, (see for instance Tranmer and Steel 2001 who used 8 Census variables for one SAR District) or a large extent of data for a smaller amount of variables (see for instance Amrhein and Flowerdew, 1989 who used Canadian migration data). Therefore there is a clear need for an investigation using the British section of the 1991 UK population Census. This seeks to identify the presence of the MAUP through a dataset of far greater size than has previously been done. Once identified a commentary on the state of the MAUP, specifically the scale effect will be presented. This analysis will also provide the background for a large scale test of the statistical measures of Aggregation Effects and Intra-Area Correlations developed by Tranmer and Steel (2001) to assess their appropriateness for commentary on the incidence of the MAUP in large dataset (for a more extensive discussion of the statistical measures see Chapter 3).
The work presented by Openshaw and Taylor (1979) was highly relevant in the determination of the pervasiveness and complexity of the MAUP. Thus, attempts to engineer high levels of MAUP, using the 1991 data are presented, partly to replicate this work, but also to determine how different variable act under aggregation in
different places. This moves beyond the Openshaw and Taylor work, as it seeks to present the results only for aggregations that could be considered realistically appropriate to the data. Therefore, issues such as compactness of the zones will not be ignored. To increase the understanding of the MAUP, a range of variables that could determine the underlying pattern of the data will be considered. It has already been identified and discussed that spatial autocorrelation is important in the incidence of the MAUP. However, little discussion has taken place to determine what conditions in the underlying data need to be present to result in the processes that give rise to the varying levels of spatial autocorrelation. Finally, attempts are made to visually identify the spatial processes. This is important, as their existence will give visual proof to the concepts discussed by Green and Flowerdew (2001) and demonstrate the importance of data structure in the recognition of the causes for the MAUP. Overall, this series of analyses seeks to provide a greater understanding of the likely causes that contribute to the MAUP. They will not provide a solution, nor do they seek to. As with the literature by Holt et al (1996a), Holt et al (1996b), and Tranmer and Steel (2001) they seek to provide a better understanding of the data.
Chapter 3
Methodology
3.1. Introduction
The purpose of this work is to seek a greater understanding of the Modifiable Areal Unit Problem, or Phenomenon (MAUP), specifically to investigate the nature and potential causes of the scale effect. Tranmer and Steel (2001) outline a methodology that enables investigation of the scale effect. Their methodology is adopted and tested here, in order to investigate the scale effect in British Census data, and to provide a starting point for a further, more detailed investigation. This investigates how areal units are composed in terms of the smaller level units from which they are constructed. This is designed to increase understanding of the MAUP through the factors that may contribute to incidence. The methodology that Tranmer and Steel suggest is outlined below. Three specific types of analysis are explained relating to the work by Tranmer and Steel. This is then supplemented by a fourth section, which extends their work. The three analyses are: a consideration of the scale effect for the whole of GB using the 1991 Census data; a discussion proposing the factors that might be used to predict the level of the scale effect, without needing to fully calculate the range of measures set out by Tranmer and Steel, and an investigation using known levels of homogeneity to investigate the performance of the methodology. The final section considers the composition of the Districts within which the above analysis takes place. This is done through an extension of the multilevel model, which is explored in theoretical and mathematical detail below. However, it is necessary to outline the data that are to be used, as the methodology is dependent on an understanding of the data.
3.2. Data
The data used throughout are derived from the 1991 UK Population Census and are drawn from the Small Area Statistics (SAS) and the Sample of Anonymised Records (SAR). The data are organised into the 278 SAR Districts, which form a complete coverage over Great Britain. The SAR Districts are relatively large spatial areas, consisting of a minimum of 120000 people (Marsh, 1993 p.305). In practice, the population sizes of the SAR Districts are greater than 120,000 people, which is
similar to the population of Local Authority Districts. There are two datasets within the SAR set providing individual level data records. These are the 1% household and 2% individual datasets, where the percentage refers to the proportion of the population included in the sample. The 2% individual sample is used in the analysis performed below. Although the 1991 data have been superseded by the 2001 Census data, there were a number of advantages to using the 1991 datasets. Primarily this is due to the lack of an individual level dataset in 2001. This is present in the 1991 data, in the Sample of Anonymised Records, and therefore makes the 1991 data more appropriate than the 2001 data. Even when the 2001 SAR data are released, although the sample will be at 3% rather than 2%, there will be no geographical identifiers below the government office region for the population (Dale and Teague, 2002). The means that this work could not be carried out using the 2001 SAR release as it currently stand.
The variables used for the study are as used in Tranmer and Steel (2001) and are outlined descriptively in table 3.1, whilst table 3.2 provides definitions of the variables from the Census tables (obtained via CASWEB, the Census Dissemination Unit run for the academic community funded by ESRC and JISC see Harris et al. 2002 for more information). This set of variables was chosen for a number of reasons. Firstly, it enables comparison with the Tranmer and Steel (2001) results. Secondly, tenure variables (RLA and OO) have been shown to be variables that exhibit high levels of scale effect. They are, therefore, of particular interest. Conversely, the employment variables (EMP and UNEMP) have been shown to exhibit relatively low scale effect. The other variables such as NONW and CAR0 were chosen as they are thought to be variables that tend to have high levels of spatial concentration, and therefore could be useful in investigating the spatial processes that contribute to the scale effect (see Tranmer and Steel, 2001 for a breakdown demonstrating how the different variables react under aggregation).
There are some important differences between the English and Welsh data, and the Scottish data. The basic spatial unit (BSU) for the Scottish data, the Output Area, is smaller than the BSU for the English and Welsh data, the Enumeration District, with an average of 147.5 people versus an average of 487.5 people. However, these different areal units are frequently analysed together, a fact which has major
Variable Description
A60P Proportion of the population aged sixty years or over NONW Proportion of the population that are not white
EMP Proportion of the population employed from the total considered as economically active
UNEMP Proportion of the population unemployed
LLTI Proportion of the population with limiting long-term illness CAR0 Proportion of households with no car
OO Proportion of households owning their house
RLA Proportion of households living in accommodation rented from the local authorities.
Table 3.1:Description of the variables to be used.
Variable Census element
A60P (S350106 + S350113 + S350120 + S350127 + S350134 + S350141 + S350148) /S350001 NONW (S06003 + S06004 + S06005 + S06006 + S06007 + S06008 + S06009 + S06010 + S06011) / S06001 EMP (S340007 - S340043) / S010065 UNEMP S340043 / S010065 LLTI S120001 / S010065 CAR0 S210045 / S210044
In England and Wales ((S200142 + S200143) / S200141) OO
In Scotland: ((S200156 + S200157) / S200155) In England and Wales: (S200148 / S200141)
RLA
In Scotland: (S200162 / S200155)
Table 3.2:The variables defined through the Census tables from which they are constructed.
consequences, outlined below. For simplification, both will be referred to as Enumeration Districts (EDs). Moreover, the second level of aggregation for Scotland is known as the Pseudo Postcode Sector, while in England and Wales it is known as the Ward. These areal units are more similar in size and will be referred to as Wards for simplification. The number of areal units in each District can vary considerably,
from 150 to 5000 EDs, and between 13 and 139 Wards. In England and Wales there are 113,196 EDs, with a further 38,255 EDs in Scotland (Dale and Marsh, 1993, p.55).
A further difference between data from England and Wales and data from Scotland is observed in the construction of the tenure variables. Although the variable names used in the construction of the OO variable are different, the data which they select are the same as they both represent a set of data recording houses that are owner occupiers, which can be divided into outright owners, or buying owners. The RLA variable is not constructed in a similar manner. In England and Wales the RLA variable is composed of the percentage of households who rent their property from a Local Authority, or a New Town. However, for the Scottish data these two groups are separate categories, with the New Town homes combined in a variable with households renting from Scottish Homes. Therefore, there needs to be a further note of caution when comparing data between the two areas, as not only are the boundary definitions different, but in the case of RLA, the data definition is also different.
Those zones that had their population suppressed for confidentiality reasons, such as a population below the disclosure threshold, were excluded from the study as they would not record a realistic level of homogeneity relative to those zones with which they were contiguous. Each variable was calculated as a proportion of the resident population, all those people resident in a household on the day of the Census. Consequently, members of the population who were recorded as visitors, and those recorded as not being members of a household (such as those living in Residential Homes) were excluded from the analysis.
These definitions relate to the aggregate level data that were used in this investigation. However, there was also a requirement for individual level, or a sample of individual level data. This came from the 1991 UK Census Sample of Anonymised Records (SARs) using the 2% individual data. The variables are recoded into Boolean responses, determined by whether or not a given individual in the data matches a given criterion. These were then treated in the same way as the aggregate level data, and proportions and weighted variances calculated for each SAR District. Herein, each SAR District will be referred to as a District, and when a SAR District is
constructed from more than one Census District then the SAR District will be referred to by the name of the first District listed. Thus, the SAR District of Reigate and Banstead with Tandridge is known as Reigate, as in Tranmer and Steel (2001).