Capitulo Quince LA GUARIDA DEL LE Ó N

The research presented in this thesis was directed at providing an answer to the research question presented in Chapter 1, namely:

What are the most appropriate end-to-end computational processes required to collect population census data from satellite imagery using classification and regression techniques?

This research question had a number of research issues associated with it that required resolu- tion before an answer to the central research question could be derived. This section presents an overview of the main findings, and the research contributions, of the work presented in this thesis with respect to the above research question and associated research issues. The section is organised by considering each of the research issues itemised in Chapter 1 in turn and then returning to the research question.

1. What are the most appropriate mechanisms for segmenting a given satellite image so that appropriate individual household sub-images (if any) can identified?

Two categories of segmentation were considered in the thesis. The first was used to process satellite images, in the context of the provision of training and/or test data, where the location of the household was known. The second was used where the location was not known and first had to be established. The first was considered in Chapter 3 and com- prised a complex process restricted to the identification of rectangular shaped households (it is acknowledged that this was a limitation but still effective in terms of the evaluation for which the resulting household image data was used). The second was presented in Chapter 8 and consisted of simply surrounding identified household locations with a bounding box dimensioned so as to encompass the domain of individual households regardless of shape. The second can be argued to be the “most appropriate mechanism” for segmenting satellite images so that appropriate individual household sub-images can identified because of its simplicity and its consequent general application.

2. Given a set of identified household images how should the content of those images be represented so that compatibility with classification and regression generation is achieved while at the same time ensuring that key information is retained?

In Chapter 4, 5 and 6 three different representations were considered: graph-based, colour histogram based and texture based. The aim of each representation was to capture the salient elements of individual households in the best way possible so as to facilitate household size prediction (in terms of number of people). In each case the representation was eventually translated into a feature vector representation compatibility with most classification and regression generation approaches. Each representation used par-

ticular mechanisms to retain key information, the most effective representation in this context was established by conducting a series of experiments using prelabelled training and test data. The most effective representation was found to be LBP representation.

3. When representing household images what is the nature of the key information to be captured?

The nature of the key information to be captured was not identified specifically. Instead a number of image representations were considered, as noted above, and whether they succeeded in capturing key information or not was established through an evaluation process. The intuition here was that the best performing representation (in terms of prediction) would also be the representation that best served to capture key household image data without specifically identifying what the key information was (if any).

4. What are the most appropriate classification/regression techniques for predicting census data given a processed collection of household images?

The three proposed representations were initially evaluated using classification model generators. This evaluation established that the LBP representation was the most effective (as noted above) together with Chi-Squared feature selection andk=40 and Neural Networks classification. This representation was then used in the context of regression model generation where the best performance was obtained using the LPB representation coupled with the CFS feature selection and SVMreg regression model generation. Over- all, the regression was found to outperform classification, the conjectured reason for this was that classification operated using categorical labels while regression operated using real number values.

5. What is the process for conducting a large scale census comprising many satellite images?

The proposed unified process for large scale population estimation mining using satellite imagery was presented in Chapter 8. This was an five steps process comprising: (i) map collection, (ii) segmentation, (iii) duplicated household detection and pruning, (iv) image representation and (v) prediction. The evaluation of the proposed process, using a region of Ethiopia where the population size was known demonstrated that the process worked well.

6. In the context of conducting large scale surveys how can issues associated with “overlapping” satellite images best be resolved?

An issue with the large scale process of population estimation mining was that, so as to ensure no households were missed, it was necessary to overlap satellite images. This in turn meant that households might appear in more than one image. A process for dealing

the duplicate household detection and pruning process. Experiments indicated that the process seemed to work well.

Returning to the initial research question, the most appropriate end-to-end computational processes required to collect population census data from satellite imagery, using classification and regression techniques, is founded on the a process that encompasses: (i) a process for col- lecting a sequence of satellite images over a specified area, (ii) a household detection algorithm founded on the usage of masks to isolate individual households (identifiable by their distinc- tive roof colour), (iii) a simple segmentation technique found on a bounding box concept, (iv) application of a duplicate household detection and pruning process, (v) representation of individual households using the LBP texture based and graph-based representations (two of the three representations considered) and (vi) household “family size” prediction using classification/regression analysis. The experimental results indicated that good estimates of population size could be obtain at very little cost.

The primary contributions of the research work presented in this thesis were presented in Section 1.4 of Chapter 1, for convenience they are again presented below. Noted that in each case the relevant chapter where the contribution was establishes is given in parenthesis.

1. A novel approach for image segmentation specifically designed for segmenting individual households featured in a satellite image data set (Chapter 3).

2. A household image representation founded on a quadtree based hierarchical decompo- sition of space together with a frequent subgraph mining algorithm for dimensionality reduction. The identified frequent subgraphs were arranged into a feature vector format, one vector per household, suited for input into a classification or regression model generation algorithm (Chapter 4).

3. A household image representation founded on a colour histogram based approach. More specifically an image representation founded on multiple histograms extract from various colour channels; a feature vector format was again used (Chapter 5).

4. A household image representation founded on the concept of “texture” analysis. More specifically usage of Local Binary Patterns (LBPs), as before a feature vector format was again derived (Chapter 6).

5. A detailed comparison of the proposed household image representations (Section 6.7 of Chapter 6).

6. An analysis of a sequence of classifier generation algorithms so as to identify the most appropriate in the context of population estimation prediction from satellite data (Section 6.7 of Chapter 6).

7. An analysis of a number of regression model generation algorithms so as to identify the most appropriate in the context of population estimation prediction from satellite data (Chapter 7).

8. An effective mechanism for satellite image collection using the Google Static Maps ser- vice to obtain satellite image data for a specified area (Section 8.2 of Chapter 8).

9. A novel approach for household detection specifically designed for the purpose of identifying and segmenting individual households featured in a satellite image data set cov- ering a prescribed area Section 8.3 of Chapter 8.

10. A mechanism for detecting duplicated households in a given satellite image data collection so as to address the image “overlap” problem (Section 8.4 of Chapter 8).

11. An end-to-end process for conducting large scale population estimation mining using satellite data (Section 8.7 of Chapter 8).

12. Overall, the thesis presents an approach of population estimation founded on known techniques, but combining a new and novel methodology (entire this thesis).

In document FALLEN Por Lauren Kate (página 174-185)